Publication details [#10652]

Dorr, Bonnie Jean, Joseph Garman and Amy Weinberg. 1994. From syntactic encodings to thematic roles: building lexical entries for interlingual MT. Machine Translation 9 (3/4) : 221–250. URL
Publication type
Article in jnl/bk
Publication language


The goal of the authors is to construct large-scale lexicons for interlingual MT of English, Arabic, Korean, and Spanish. They describe techniques that predict salient linguistic features of a non-English word using the features of its English gloss (i.e., translation) in a bilingual dictionary. While not exact, owing to inexact glosses and language-to-language variations, these techniques can augment an existing dictionary with reasonable accuracy, thus saving significant time. Two experiments have been conducted that demonstrate the value of these techniques. The first tested the feasibility of building a database of thematic grids for over 6500 Arabic verbs based on a mapping between English glosses and the syntactic codes in Longman''s Dictionary of Contemporary English (LDOCE) (Procter, 1978). The authors show that it is more efficient and less error-prone to hand-verify the automatically constructed grids than it would be to build the thematic grids by hand from scratch. The second experiment tested the automatic classification of verbs into a richer semantic typology based on (Levin, 1993), from which a more refined set of thematic grids can be derived. In this second experiment, it is shown that a brute-force, non-robust technique provides 72% accuracy for semantic classification of LDOCE verbs; then it is shown that it is possible to approach this yield with a more robust technique based on fine-tuned statistical correlations. The authors further suggest the possibility of raising this yield by taking into account linguistic factors such as polysemy and positive and negative constraints on the syntax-semantics relation. We conclude that, while human intervention will always be necessary for the construction of a semantic classification from LDOCE, such intervention is significantly minimized as more knowledge about the syntax-semantics relation is introduced.
Source : Based on abstract in journal