This paper proposes a novel semi-supervised word alignment technique called EMDC that integrates discriminative and generative methods. A discriminative aligner is used to find high precision partial alignments that serve as constraints for a generative aligner which implements a constrained version of the EM algorithm. Experiments on small-size Chinese and Arabic tasks show consistent improvements on AER. We also experimented with moderate-size Chinese machine translation tasks and got an average of 0.5 point improvement on BLEU scores across five standard NIST test sets and four other test sets.
EMDC: a semi-supervised approach for word alignment
Qin Gao, Francisco Guzmán, and Stephan Vogel. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), pages 349-357, 2010.
PDF Abstract BibTex