While experimenting with tuning on long sentences, we made an unexpected discovery: that PRO falls victim to monsters -overly long negative examples with very low BLEU+1 scores, which are unsuitable for learning and can cause testing BLEU to drop by several points absolute. We propose several effective ways to address the problem, using length- and BLEU+1- based cut-offs, outlier filters, stochastic sampling, and random acceptance. The best of these fixes not only slay and protect against monsters, but also yield higher stability for PRO as well as improved test-time BLEU scores. Thus, we recommend them to anybody using PRO, monster-believer or not.
A Tale about PRO and Monsters
Preslav Nakov, Francisco Guzmán, and Stephan Vogel. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL'13), pages 12-17, 2013.
PDF Abstract BibTex Slides