Retraining

By default, OpenNMT saves a checkpoint every 5000 iterations and at the end of each epoch. For more frequent or infrequent saves, you can use the -save_every and -save_every_epochs options which define the number of iterations and epochs after which the training saves a checkpoint.

在几种情况下,有可能需要通过 -train_from 选项用保存的模型进行训练:

  • 继续进行已停止的训练
  • 继续使用较小的批进行培训
  • 用新数据训来练模型(增量适应)
  • 从预训练的参数开始训练
  • 其它情况

Considerations

When training from an existing model, some settings can not be changed:

  • the model topology (layers, hidden size, etc.)
  • the vocabularies

Exceptions

-dropout, -fix_word_vecs_enc and -fix_word_vecs_dec are model options that can be changed for a retraining.

Resuming a stopped training

训练中止的情况是很常见的, crash, server reboot, user action, etc. In this case, you may want to continue the training for more epochs by using using the -continue flag. 例如:

# start the initial training
th train.lua -gpuid 1 -data data/demo-train.t7 -save_model demo -save_every 50

# train for several epochs...

# need to reboot the server!

# continue the training from the last checkpoint
th train.lua -gpuid 1 -data data/demo-train.t7 -save_model demo -save_every 50 -train_from demo_checkpoint.t7 -continue

-continue 这一标志确保训练以相同的配置和优化状态继续进行。 特别是在以下选项被设置为它们最后已知值的时候:

  • -curriculum
  • -decay
  • -learning_rate_decay
  • -learning_rate
  • -max_grad_norm
  • -min_learning_rate
  • -optim
  • -start_decay_at
  • -start_decay_ppl_delta
  • -start_epoch
  • -start_iteration

Note

The -end_epoch value is not automatically set as the user may want to continue its training for more epochs past the end.

Additionally, the -continue flag retrieves from the previous training:

  • the non-SGD optimizers states
  • the random generator states
  • the batch order (when continuing from an intermediate checkpoint)

Training from pre-trained parameters

另一个案例就是使用一个基本模型,然后用新的选项对其进一步进行训练 (特别是优化方法和学习速率)。 使用 -train_from 但不选择 -continue 将用预训练模型初始化的参数启动一个新的训练。

Updating the vocabularies

  • -update_vocab (accepted: none, replace, merge; default: none)

It is possible that we restart the training with a new dataset such as dynamic dataset, we could have different vocabularies in dynamic dataset and the pre-trained model. Instead of re-initializing the whole network, the pre-trained states of the common words in the new/previous dictionaries can be kept with option -update_vocab. This option is disabled by default and the update of word features isn't supported for instant. replace mode will only keep the common words. For non-common words, the old ones will be deleted and the new onse will be initialized. merge mode will keep the state of all the old words. The new words will be initialized.