Fork me on GitHub

Recent Advances in Neural Machine Translation

Encoder-decoder with attention mechanism.

Neural models for machine translation was introduced seriously in 2014. With the introduction of attention models their performance improved to levels comparable to those of statistical phrase-based machine translation, the type of translation we are all familiar with through servies like Google Translate.

However, the models have struggled with problems like limited vocabularies, the need of large amounts of data for training, and that they are expensive to train and use.

In the recent months, a number of papers have been published to remedy some of these issues. This includes techniques to battle the limited vocabulary problem, and of using monolingual data to improve the performance. As recently as Monday evening (Sept 26), Google uploaded a paper on their implementation of these ideas, where they claim performance on par with human translators, both counted in BLEU scores, and in human evaluations.

During this talk, I'll go through the ideas behind these recent papers.


My blog post, covering some of the content of the talk.

  • Sequence to Sequence Learning with Neural Networks, Ilya Sutskever, Oriol Vinyals, Quoc V. Le. NIPS 2014 PDF, arXiv
  • Neural Machine Translation of Rare Words with Subword Units, Rico Sennrich and Barry Haddow and Alexandra Birch, ACL 2016: PDF,
  • A Character-level Decoder without Explicit Segmentation for Neural Machine Translation, Junyoung Chung, Kyunghyun Cho, and Yoshua Bengio, ACL 2016: PDF,
  • Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models, Minh-Thang Luong and Christopher D. Manning, ACL 2016: PDF,
  • Improving Neural Machine Translation Models with Monolingual Data, Rico Sennrich; Barry Haddow; Alexandra Birch, ACL 2016: PDF,
  • Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation, Jie Zhou, Ying Cao, Xuguang Wang, Peng Li, Wei Xu (Baidu): PDF, arXiv
  • Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, Yonghui Wu (Google): PDF, arXiv
  • Sequence Level Training with Recurrent Neural Networks, Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba: PDF, arXiv
  • Sequence-to-Sequence Learning as Beam-Search Optimization, Sam Wiseman, Alexander M. Rush: PDF, arXiv

Slides (PDF)

Chalmers Machine Learning Seminars, 2016-09-29
Olof Mogren

Olof Mogren, PhD, RISE Research institutes of Sweden. Follow me on Bluesky.