User-driven assessment of commercial term extractors

Kwong, Oi Yee

doi:10.1075/term.20032.kwo

Article published In:

Terminology
Vol. 27:2 (2021) ► pp.179–218

User-driven assessment of commercial term extractors

Oi Yee Kwong

In this paper, we address the system evaluation issue for commercial term extraction tools from the users’ perspective. We first revisit the gold standard approach commonly practised among researchers, and discuss the challenges it may pose on end users, taking translators as a typical example. Considering the very different motivations and needs of users and researchers, a user-driven approach is proposed as a variation and alternative to the gold standard approach to allow users to assess and understand the performance of commercial tools more objectively. Its feasibility and usefulness are demonstrated by deploying a benchmarking dataset of English-Chinese financial terms, produced by multiple annotators, in a case study with SDL MultiTerm Extract. The results also provide insight for future development of term extractors designed for translators, which will hopefully generate more accurate candidates, offer more customised features, enable better user experience, and enjoy wider popularity as a computer-aided translation tool.

Keywords: automatic term extraction, bilingual term annotation, computer-aided translation, financial terminology, user-driven system assessment

Article outline

1.Introduction
2.Related work
- 2.1Automatic term extraction
- 2.2The issue of system evaluation
3.Creating the user-made benchmark
- 3.1The corpus
- 3.2English-Chinese financial terms in existing resources
- 3.3Term annotation guidelines
  - Scope of terms
  - Form of terms
  - Span of terms
- 3.4The annotation and the resulting benchmark
4.Assessing systems with user-driven benchmarks
- 4.1SDL MultiTerm Extract
- 4.2Monolingual English term extraction
- 4.3Monolingual Chinese term extraction
- 4.4Bilingual English-Chinese term extraction
5.Discussion
- 5.1User-driven approach to accommodate individual needs
- 5.2An informal comparison with research-based systems
6.Conclusion
References

Published online: 3 August 2021

https://doi.org/10.1075/term.20032.kwo

References

Agirre, Eneko, Xabier Arregi, Xabier Artola, Arantza Díaz de Illarraza, Kepa Sarasola, and Aitor Soroa

2000 “A Methodology for Building Translator-oriented Dictionary Systems.” Machine Translation 151: 295–310.

Baldwin, Timothy, and Takaaki Tanaka

2004 Translation by machine of complex nominals: Getting it right. In Proceedings of the Second ACL Workshop on Multiword Expressions: Integrating Processing, 24–31. Barcelona, Spain.

Bernier-Colborne, Gabriel, and Patrick Drouin

2014 “Creating a test corpus for term extractors through term annotation.” Terminology 20(1): 50–73.

Bertels, Ann, and Dirk Speelman

2014 “Clustering for semantic purposes: Exploration of semantic similarity in a technical corpus.” Terminology 20(2): 279–303.

Black, E., S. Abney, D. Flickenger, D. C. Gdaniec, R. Grishman, P. Harrison, D. Hindle, R. Ingria, F. Jelinek, J. Klavans, M. Liberman, M. Marcus, S. Roukos, B. Santorini, and T. Strzalkowski

1991 “A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars.” In Proceedings of the DARPA Workshop on Speech and Natural Language, 306–311. Pacific Grove, California.

Blancafort, Helena, Francis Bouvier, Béatrice Daille, Ulrich Heid, and Anita Ramm

2013 “TTC Web Platform: from Corpus Compilation to Bilingual Terminologies for MT and CAT Tools.” In Proceedings of TRALOGY II, Paris.

Bourigault, Didier

1992 “Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases.” In Proceedings of the Fourteenth International Conference on Computational Linguistics (COLING ’92), 977–981. Nantes, France.

Bowker, Lynne

2015 “Computer-aided Translation: Translator training.” In The Routledge Encyclopedia of Translation Technology, ed. by Sin-Wai Chan. Routledge.

Cabré Castellví, M. Teresa, Rosa Estopà Bagot, and Jordi Vivaldi Palatresi

2001 “Automatic term detection: A review of current systems.” In Recent Advances in Computational Terminology, ed. by Didier Bourigault, Christian Jacquemin, and Marie-Claude L’Homme, 53–87. Amsterdam/Philadelphia: John Benjamins Publishing Company.

Cabré, M. Teresa

1996 “Terminology today.” In Terminology, LSP and Translation: Studies in Language Engineering in Honour of Juan C. Sager, ed. by Harold Somers, 15–35. Amsterdam/Philadelphia: John Benjamins Publishing Company.

Cao, Yunbo, and Hang Li

2002 “Base noun phrase translation using Web data and the EM algorithm.” In Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei.

Chung, Teresa Mihwa

2003 “A corpus comparison approach for terminology extraction.” Terminology 9(2): 221–246.

Daille, Béatrice

1996 “Study and implementation of combined techniques for automatic extraction of terminology.” In The Balancing Act: Combining symbolic and statistical approaches to language, ed. by Judith L. Klavans and Philip Resnik, 49–66. Cambridge, MA: MIT Press.

2005 “Variations and Application-oriented Terminology Engineering.” Terminology, 11(1): 181–197.

2012 “Building bilingual terminologies from comparable corpora: The TTC TermSuite.” In Proceedings of the 5th Workshop on Building and Using Comparable Corpora, 29–32. Istanbul, Turkey.

Daille, Béatrice, and Emmanuel Morin

2005 “French-English Terminology Extraction from Comparable Corpora.” In Natural Language Processing – IJCNLP 2005. Lecture Notes in Artificial Intelligence, ed. by R. Dale, K-F. Wong, J. Su and O. Y. Kwong, Volume 36511, 707–718. Springer-Verlag.

Drouin, Patrick

2003 “Term extraction using non-technical corpora as a point of leverage.” Terminology 9(1): 99–115.

Erdmann, Maike, Kataro Nakayama, Takahiro Hara, and Shojiro Nishio

2009 “Improving the Extraction of Bilingual Terminology from Wikipedia.” ACM Transactions on Multimedia Computing, Communications and Applications 5(4): Article 31.

Estopà, Rosa

2001 « Les unités de signification spécialisées: élargissant l’objet du travail en terminologie» [Units of Specialised Meaning: Broadening the Scope of Terminology Work]. Terminology 7(2): 217–237.

Fernandez Parra, M., and P. Hacken

2010 “Identifying Fixed Expressions: A Comparison of SDL MultiTerm Extract and Déjà Vu’s Lexicon.” In Proceedings of Translating and the Computer 321, ASLIB.

Foo, J., and M. Merkel

2010 “Using machine learning to perform automatic term recognition.” In Proceedings of the LREC 2010 Workshop on Methods for Automatic Acquisition of Language Resources and their Evaluation Methods, 49–54. Valletta, Malta.

Fulford, Heather

2001 “Exploring Terms and Their Linguistic Environment in Text: A Domain-Independent Approach to Automated Term Extraction.” Terminology 7(2): 259–279.

Fung, Pascale

1998 “A statistical view on bilingual lexicon extraction: from parallel corpora to non-parallel corpora.” Lecture Notes in Artificial Intelligence, Volume 1529, 1–17. Springer.

Hätty, Anna, and Sabine Schulte im Walde

2018 “A Laypeople Study on Terminology Identification across Domains and Task Definitions.” In Proceedings of the 2018 Conference of the North America Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 321–326. New Orleans, Louisiana.

Hazem, Amir, and Emmanuel Morin

2016 “Efficient Data Selection for Bilingual Terminology Extraction from Comparable Corpora.” In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers, 3401–3411. Osaka, Japan.

Hazem, Amir, Mérième Bouhandi, Florian Boudin, and Béatrice Daille

2020 “TermEval 2020: TALN-LS2N System for Automatic Term Extraction.” In Proceedings of the 6th International Workshop on Computational Terminology (COMPUTERM 2020), 95–100.

Hippisley, Andrew R., David Cheng, and Khurshid Ahmad

2005 “The head-modifier principle and multilingual term extraction.” Natural Language Engineering 11(2): 129–157.

Kageura, Kyo

1996 “Methods of Automatic Term Recognition – A Review.” Terminology, 3(2): 259–289.

Kageura, Kyo, Masaharu Yoshioka, Keita Tsuji, Fuyuki Yoshikane, Koichi Takeuchi, and Teruo Koyama

1999 “Evaluation of the Term Recognition Task.” In Proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition, 417–434. Tokyo, Japan.

Kilgarriff, Adam, and Joseph Rosenzweig

2000 “Framework and results for English SENSEVAL.” Computers and the Humanities 34(1–2): 15–48.

Kim, J.-D., T. Ohta, Y. Tateisi, and J. Tsujii

2003 “GENIA corpus – a semantically annotated corpus for bio-textmining.” Bioinformatics 19(1): i180–i182.

Kit, Chunyu, and Xiaoyue Liu

2008 “Measuring mono-word termhood by rank difference via corpus comparison.” Terminology 14(2): 204–229.

Krauthammer, Michael, and Goran Nenadić

2004 “Methodological review: Term identification in the biomedical literature.” Journal of Biomedical Informatics 37(6): 512–526.

Kwong, Oi Yee

2018a “Evaluating Term Extraction Tools: System Performance vs Use Perception.” In The Human Factor in Machine Translation, ed. by Sin-Wai Chan. Routledge.

2018b “Analysis and Annotation of English-Chinese Financial Terms for Benchmarking and Language Processing.” In Proceedings of the First Financial Narrative Processing Workshop (FNP 2018), 10–16. Miyazaki, Japan.

Kwong, Oi Yee, Benjamin K. Tsou, and Tom B. Y. Lai

2004 “Alignment and extraction of bilingual legal terminology from context profiles.” Terminology 10(1): 81–99.

Laroche, Audrey, and Philippe Langlais

2010 “Revisiting context-based projection methods for term-translation spotting in comparable corpora.” In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), 617–625. Beijing, China.

Lossio-Ventura, Juan Antonio, Clement Jonquet, Mathieu Roche, and Maguelonne Teisseire

2016 “Biomedical term extraction: overview and a new methodology.” Information Retrieval Journal 19(1): 59–99.

Macken, Lieve, Els lefever, and Véronique Hoste

2013 “TExSIS: Bilingual terminology extraction from parallel corpora using chunk-based alignment.” Terminology 19(1): 1–30.

Meyer, Ingrid

1991 “Knowledge Management for Terminology-Intensive Applications: Needs and Tools.” In Proceedings of Workshop on Lexical Semantics and Knowledge Representation, 20–33. Berkeley, California, USA.

QasemiZadeh, Behrang, and Anne-Kathrin Schumann

2016 “The ACL RD-TEC 2.0: A Language Resource for Evaluating Term Extraction and Entity Recognition Methods.” In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 1862–1868. Portorož, Slovenia.

QasemiZadeh, Behrang, and Siegfried Handschuh

2014 “The ACL RD-TEC: A Dataset for Benchmarking Terminology Extraction and Classification in Computational Linguistics.” In Proceedings of the 4th International Workshop on Computational Terminology, 52–63. Dublin, Ireland.

Quirchmayr, Thomas, Barbara Paech, Roland Kohl, Hannes Karey, and Gunar Kasdepke

2018 “Semi-automatic rule-based domain terminology and software feature-relevant information extraction from natural language user manuals: An approach and evaluation at Roche Diagnostics GmbH.” Empirical Software Engineering 231: 3630–3683.

Resnik, Philip, and I. Dan Melamed

1997 “Semi-automatic acquisition of domain-specific translation lexicons.” In Proceedings of the Fifth Conference on Applied Natural Language Processing, 340–347. Washington DC, USA.

Rigouts Terryn, Ayla, Véronique Hoste, and Els Lefever

2019 “In No Uncertain Terms: A Dataset for Monolingual and Multilingual Automatic Term Extraction from Comparable Corpora.” Language Resources and Evaluation 541: 385–418.

Rigouts Terryn, Ayla, Véronique Hoste, Patrick Drouin, and Els Lefever

2020 “TermEval 2020: Shared Task on Automatic Term Extraction Using the Annotated Corpora for Term Extraction Research (ACTER) Dataset.” In Proceedings of the 6th International Workshop on Computational Terminology (COMPUTERM 2020), 85–94.

Sager, Juan C.

1990 A Practical Course in Terminology Processing. Amsterdam: John Benjamins Publishing Company.

Smadja, Frank, Vasileios Hatzivassiloglou, and Kathleen McKeown

1996 “Translating collocations for bilingual lexicons: A statistical approach.” Computational Linguistics 22(1): 1–38.

Sproat, Richard, and Thomas Emerson

2003 “The First International Chinese Word Segmentation Bakeoff.” In Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, 133–143. Sapporo, Japan.

Vivaldi, Jorge, and Horacio Rodríguez

2007 “Evaluation of Terms and Term Extraction Systems: A Practical Approach.” Terminology 13(2): 225–248.

Voorhees, Ellen M., and Donna K. Harman

(eds) 2005 TREC: Experiment and Evaluation in Information Retrieval. Cambridge, MA: The MIT Press.

Wang, Rui, Wei Liu, and Chris McDonald

2016 “Featureless Domain-specific Term Extraction with Minimal Labelled Data.” In Proceedings of Australasian Language Technology Association Workshop, 103–112.

Warburton, Kara

2020 “Supporting Translators through Keyword Mining.” In Book of Abstracts of Translation in Transition (TT5): Human and Machine Intelligence, Virtual Conference, 34–38.

Xu, Ran, and Serge Sharoff

2014 “Evaluating Term Extraction Methods for Interpreters.” In Proceedings of the 4th International Workshop on Computational Terminology, 86–93. Dublin, Ireland.

Zaretskaya, Anna

2017 Translators’ Requirements for Translation Technologies: User Study on Translation Tools. Doctoral Thesis, Universidad de Málaga.