Xu, Donghua and Chew Lim Tan. 1999. Alignment and matching of bilingual English-Chinese news texts. Machine Translation 14 (1) : 1–33.
This paper presents a project to align and match bilingual English-Chinese news files downloaded from the China News Service's website. The work involves the alignment of bilingual texts at the sentence and clause levels. It addition, the work also requires matching of files as the English and Chinese news files downloaded from the web do not come in the same sequential order. These news files have their own characteristics and, furthermore, the issue of file matching has its unique difficulties apart from the known problems of alignment work previously reported in the literature. To align the news files the authors combine the criteria of “anchors” (i.e. unambiguous corresponding text elements) and sentence length. They employ Dynamic Programming first to align at the paragraph level, then to align at the sentence-clause level. In file matching the authors encounter a “collision” problem due to contending matching candidates, and propose a recursive splitting algorithm to resolve the problem.
