In this paper we study in detail the relation between word alignment and phrase extraction. First, we analyze different word alignments according to several characteristics and compare them to hand-aligned data. Secondly, we analyzed the phrase-pairs generated by these alignments. We observed that the number of unaligned words has a large impact on the characteristics of the phrase table. A manual evaluation of phrase pair quality showed that the increase in the number of unaligned words results in a lower quality. Finally, we present translation results from using the number of unaligned words as features from which we obtain up to 2BP of improvement.
Reassessment of the role of phrase extraction in pbsmt
Francisco Guzmán, Qin Gao and Stephan Vogel. In Machine Translation Summit XII 2009.
PDF Abstract BibTex Slides