Publication details [#13338]

Publication type
Article in jnl/bk
Publication language


This paper describes one phase of a large-scale machine translation (MT) quality assurance project. The authors explore a novel approach to discriminating MT-unsuitable source sentences by predicting the expected quality of the output. The resources required include a set of source/MT sentence pairs, human judgments on the output, a source parser, and an MT system. A number of syntactic, semantic, and lexical features are extracted from the source sentences only and train a classifier that is called the 'Syntactic, Semantic, and Lexical Mode' (SSLM) (cf. Gamon et al., 2005; Liu & Gildea, 2005; Rajman & Hartley, 2001). Despite the simplicity of the approach, SSLM scores correlate with human judgments and can help determine whether sentences are suitable or unsuitable for translation by this MT system. SSLM also provides information about which source features impact MT quality, connecting this work with the field of controlled language (CL) (cf. Reuther, 2003; Nyberg & Mitamura, 1996). With a focus on the input side of MT, SSLM differs greatly from evaluation approaches such as BLEU (Papineni et al., 2002), NIST (Doddington, 2002) and METEOR (Banerjee & Lavie, 2005) in that these other systems compare MT output with reference sentences for evaluation and do not provide feedback regarding potentially problematic source material. This method bridges the research areas of CL and MT evaluation by addressing the importance of providing' MT-suitable' English input to enhance output quality.
Source : Based on abstract in book