In this study, we propose a method for aligning terms and extracting translations from a small, domain-specific corpus consisting of parallel English and Chinese court judgments from Hong Kong. With a sentence-aligned corpus, translation equivalents are suggested by analysing the frequency profiles of parallel concordances. The method overcomes the limitations of conventional statistical methods which require large corpora to be effective, and those of lexical approaches which depend on existing bilingual dictionaries. Pilot testing on a parallel corpus of about 113K Chinese words and 120K English words gives an encouraging 79% precision and 38% recall on average. The method has its own limitations such as failure to detect multiple candidates and secondary translations, but it provides a good basis for acquiring an initial translation lexicon for legal terminology from indigenous bilingual legal texts.
2013. The Place of Comparable Corpora in Providing Terminological Reference Information to Online Translators: A Strategic Framework. In Building and Using Comparable Corpora, ► pp. 285 ff.
Gamallo, Pablo & Marcos Garcia
2012. Extraction of Bilingual Cognates from Wikipedia. In Computational Processing of the Portuguese Language [Lecture Notes in Computer Science, 7243], ► pp. 63 ff.
Sun, Yueheng, Weijie Ni & Rui Men
2009. 2009 Second International Symposium on Knowledge Acquisition and Modeling, ► pp. 96 ff.
Gamallo Otero, Pablo & José Ramom Pichel Campos
2008. Learning Spanish-Galician Translation Equivalents Using a Comparable Corpus and a Bilingual Dictionary. In Computational Linguistics and Intelligent Text Processing [Lecture Notes in Computer Science, 4919], ► pp. 423 ff.
Otero, Pablo Gamallo & José Ramom Pichel Campos
2005. An Approach to Acquire Word Translations from Non-parallel Texts. In Progress in Artificial Intelligence [Lecture Notes in Computer Science, 3808], ► pp. 600 ff.
Otero, Pablo Gamallo
2006. Using Natural Alignment to Extract Translation Equivalents. In Computational Processing of the Portuguese Language [Lecture Notes in Computer Science, 3960], ► pp. 41 ff.
This list is based on CrossRef data as of 27 september 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.