# 字特征

OpenNMT 以离散标签的形式支持源和目标词的其它特征。

• On the source side, these features act as additional information to the encoder. 针对每一个标签，嵌入先被优化，然后作为附加源信息与它所诠释的字一起被输入。
• On the target side, these features will be predicted by the network. 解码器会解码句子并诠释每一个字。

data/src-train-case.txt 是使用分割特征来表示每个字大小写的例子。 Using case as a feature is a way to optimize the word dictionary (no duplicated words like "the" and "The") and gives the system an additional information that can be useful to optimize its objective function.

it￨C is￨l not￨l acceptable￨l that￨l ,￨n with￨l the￨l help￨l of￨l the￨l national￨l bureaucracies￨l ,￨n parliament￨C &apos;s￨l legislative￨l prerogative￨l should￨l be￨l made￨l null￨l and￨l void￨l by￨l means￨l of￨l implementing￨l provisions￨l whose￨l content￨l ,￨n purpose￨l and￨l extent￨l are￨l not￨l laid￨l down￨l in￨l advance￨l .￨n


You can generate this case feature with OpenNMT's tokenization script and the -case_feature flag.

## Time-shifting¶

By default, word features on the target side are automatically shifted compared to the words so that their prediction directly depends on the word they annotate. This way, the decoder architecture is similar to a RNN-based sequence tagger with the output of a timestep being the tag of the input.

More precisely at timestep $t$:

• the inputs are $words^{(t)}$ and $features^{(t-1)}$
• the outputs are $words^{(t+1)}$ and $features^{(t)}$

To reuse available vocabulary, $features^{(-1)}$ is set to the end of sentence token.

## Vocabularies¶

# unlimited source features vocabulary size
-src_vocab_size 50000

# first feature vocabulary is limited to 60, others are unlimited
-src_vocab_size 50000 60

# second feature vocabulary is limited to 100, others are unlimited
-src_vocab_size 50000 0 100

# limit vocabulary size of the first and second feature
-src_vocab_size 50000 60 100


You can similarly use -src_words_min_frequency and -tgt_words_min_frequency to limit vocabulary by frequency instead of absolute size.

Like words, word features vocabularies can be reused across datasets with the -features_vocabs_prefix. For example, if the processing generates theses features dictionaries:

• data/demo.source_feature_1.dict
• data/demo.source_feature_2.dict
• data/demo.source_feature_3.dict

you have to set -features_vocabs_prefix data/demo as command line option.

## Embeddings¶

Then, each feature embedding is concatenated to each other by default. You can instead choose to sum them by setting -feat_merge sum. Finally, the resulting merged embedding is concatenated to the word embedding.

Warning

In the sum case, each feature embedding must have the same dimension. 您可以使用 -feat_vec_size 来设置公共嵌入的大小。

During decoding, the beam search is only applied on the target words space and not on the word features. When the beam path is complete, the associated features are selected along this path.