# Unknown words

The default translation mode allows the model to produce the symbol when it is not sure of the specific target word.

Often times symbols will correspond to proper names that can be directly transposed between languages. The -replace_unk option will substitute with source words that have the highest attention weight. The -replace_unk_tagged option will do the same, but wrap the token in a ｟unk:xxxxx｠ tag.

## Phrase table¶

Alternatively, advanced users may prefer to provide a pre-constructed phrase table from an external aligner (such as fast_align) using the -phrase_table option to allow for non-identity replacement.

source|||target


## Workarounds¶

Several techniques exist to minimize the out-of-vocabulary issue:

• sub-tokenization like BPE or "wordpiece" to simulate open vocabularies
• mixed word/characters model as described in Wu et al. (2016)