Extracting morpheme pairs from bilingual terminological corpora
An HMM-based method for extracting bilingual morpheme pairs from domain-specific bilingual term lists is reported in this paper. In recent years, many bilingual term lists have become available in electronic form. If the bilingual morpheme pairs in the lists are automatically identified, they can be used as bootstrapping information for the automatic identification of bilingual term pairs in bilingual textual corpora. Or, they can be used for automatically extracting translation rules of complex terms. In our method, Japanese terms are segmented into morphemes while at the same time the corresponding Japanese-English morpheme pairs are identified. The advantage of our method is that it requires no pre-processing tool such as a morphological analyser. The result of the experiment was quite satisfactory, our method achieved well over 80% precision and recall.
Keywords: Bilingual Morpheme Pairs, Automatic Extraction, Term List,Translation Rules, Hidden Markov Model.
Published online: 07 December 2001