User-driven assessment of commercial term extractors
In this paper, we address the system evaluation issue for commercial term extraction tools from the users’
perspective. We first revisit the gold standard approach commonly practised among researchers, and discuss the challenges it may
pose on end users, taking translators as a typical example. Considering the very different motivations and needs of users and
researchers, a user-driven approach is proposed as a variation and alternative to the gold standard approach to allow users to
assess and understand the performance of commercial tools more objectively. Its feasibility and usefulness are demonstrated by
deploying a benchmarking dataset of English-Chinese financial terms, produced by multiple annotators, in a case study with SDL
MultiTerm Extract. The results also provide insight for future development of term extractors designed for translators, which will
hopefully generate more accurate candidates, offer more customised features, enable better user experience, and enjoy wider
popularity as a computer-aided translation tool.
Article outline
- 1.Introduction
- 2.Related work
- 2.1Automatic term extraction
- 2.2The issue of system evaluation
- 3.Creating the user-made benchmark
- 3.1The corpus
- 3.2English-Chinese financial terms in existing resources
- 3.3Term annotation guidelines
- Scope of terms
- Form of terms
- Span of terms
- 3.4The annotation and the resulting benchmark
- 4.Assessing systems with user-driven benchmarks
- 4.1SDL MultiTerm Extract
- 4.2Monolingual English term extraction
- 4.3Monolingual Chinese term extraction
- 4.4Bilingual English-Chinese term extraction
- 5.Discussion
- 5.1User-driven approach to accommodate individual needs
- 5.2An informal comparison with research-based systems
- 6.Conclusion
-
References
References (55)
References
Agirre, Eneko, Xabier Arregi, Xabier Artola, Arantza Díaz de Illarraza, Kepa Sarasola, and Aitor Soroa. 2000. “A
Methodology for Building Translator-oriented Dictionary Systems.” Machine
Translation 151: 295–310.
Baldwin, Timothy, and Takaaki Tanaka. 2004. Translation
by machine of complex nominals: Getting it right. In Proceedings of
the Second ACL Workshop on Multiword Expressions: Integrating
Processing, 24–31. Barcelona, Spain.
Black, E., S. Abney, D. Flickenger, D. C. Gdaniec, R. Grishman, P. Harrison, D. Hindle, R. Ingria, F. Jelinek, J. Klavans, M. Liberman, M. Marcus, S. Roukos, B. Santorini, and T. Strzalkowski. 1991. “A
Procedure for Quantitatively Comparing the Syntactic Coverage of English
Grammars.” In Proceedings of the DARPA Workshop on Speech and Natural
Language, 306–311. Pacific Grove, California.
Blancafort, Helena, Francis Bouvier, Béatrice Daille, Ulrich Heid, and Anita Ramm. 2013. “TTC
Web Platform: from Corpus Compilation to Bilingual Terminologies for MT and CAT
Tools.” In Proceedings of TRALOGY
II, Paris.
Bourigault, Didier. 1992. “Surface
Grammatical Analysis for the Extraction of Terminological Noun
Phrases.” In Proceedings of the Fourteenth International Conference on
Computational Linguistics (COLING
’92), 977–981. Nantes, France.
Bowker, Lynne. 2015. “Computer-aided
Translation: Translator training.” In The Routledge Encyclopedia of
Translation Technology, ed. by Sin-Wai Chan. Routledge.
Cabré Castellví, M. Teresa, Rosa Estopà Bagot, and Jordi Vivaldi Palatresi. 2001. “Automatic
term detection: A review of current systems.” In Recent Advances in
Computational Terminology, ed. by Didier Bourigault, Christian Jacquemin, and Marie-Claude L’Homme, 53–87. Amsterdam/Philadelphia: John Benjamins Publishing Company.
Cabré, M. Teresa. 1996. “Terminology
today.” In Terminology, LSP and Translation: Studies in Language
Engineering in Honour of Juan C. Sager, ed. by Harold Somers, 15–35. Amsterdam/Philadelphia: John Benjamins Publishing Company.
Cao, Yunbo, and Hang Li. 2002. “Base
noun phrase translation using Web data and the EM
algorithm.” In Proceedings of the 19th International Conference on
Computational Linguistics (COLING 2002), Taipei.
Daille, Béatrice. 1996. “Study
and implementation of combined techniques for automatic extraction of
terminology.” In The Balancing Act: Combining symbolic and
statistical approaches to language, ed. by Judith L. Klavans and Philip Resnik, 49–66. Cambridge, MA: MIT Press.
Daille, Béatrice. 2012. “Building
bilingual terminologies from comparable corpora: The TTC
TermSuite.” In Proceedings of the 5th Workshop on Building and Using
Comparable Corpora, 29–32. Istanbul, Turkey.
Daille, Béatrice, and Emmanuel Morin. 2005. “French-English
Terminology Extraction from Comparable Corpora.” In Natural Language
Processing – IJCNLP 2005. Lecture Notes in Artificial Intelligence, ed. by R. Dale, K-F. Wong, J. Su and O. Y. Kwong, Volume 36511, 707–718. Springer-Verlag.
Erdmann, Maike, Kataro Nakayama, Takahiro Hara, and Shojiro Nishio. 2009. “Improving
the Extraction of Bilingual Terminology from Wikipedia.” ACM Transactions on Multimedia
Computing, Communications and Applications 5(4): Article
31.
Fernandez Parra, M., and P. Hacken. 2010. “Identifying
Fixed Expressions: A Comparison of SDL MultiTerm Extract and Déjà Vu’s
Lexicon.” In Proceedings of Translating and the
Computer 321, ASLIB.
Foo, J., and M. Merkel. 2010. “Using
machine learning to perform automatic term
recognition.” In Proceedings of the LREC 2010 Workshop on Methods for
Automatic Acquisition of Language Resources and their Evaluation
Methods, 49–54. Valletta, Malta.
Fung, Pascale. 1998. “A
statistical view on bilingual lexicon extraction: from parallel corpora to non-parallel
corpora.” Lecture Notes in Artificial Intelligence, Volume
1529, 1–17. Springer.
Hätty, Anna, and Sabine Schulte im Walde. 2018. “A
Laypeople Study on Terminology Identification across Domains and Task
Definitions.” In Proceedings of the 2018 Conference of the North
America Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short
Papers), 321–326. New Orleans, Louisiana.
Hazem, Amir, and Emmanuel Morin. 2016. “Efficient
Data Selection for Bilingual Terminology Extraction from Comparable
Corpora.” In Proceedings of the 26th International Conference on
Computational Linguistics: Technical
Papers, 3401–3411. Osaka, Japan.
Hazem, Amir, Mérième Bouhandi, Florian Boudin, and Béatrice Daille. 2020. “TermEval
2020: TALN-LS2N System for Automatic Term Extraction.” In Proceedings
of the 6th International Workshop on Computational Terminology (COMPUTERM
2020), 95–100.
Hippisley, Andrew R., David Cheng, and Khurshid Ahmad. 2005. “The
head-modifier principle and multilingual term extraction.” Natural Language
Engineering 11(2): 129–157.
Kageura, Kyo, Masaharu Yoshioka, Keita Tsuji, Fuyuki Yoshikane, Koichi Takeuchi, and Teruo Koyama. 1999. “Evaluation
of the Term Recognition Task.” In Proceedings of the First NTCIR
Workshop on Research in Japanese Text Retrieval and Term
Recognition, 417–434. Tokyo, Japan.
Kilgarriff, Adam, and Joseph Rosenzweig. 2000. “Framework
and results for English SENSEVAL.” Computers and the
Humanities 34(1–2): 15–48.
Kim, J.-D., T. Ohta, Y. Tateisi, and J. Tsujii. 2003. “GENIA
corpus – a semantically annotated corpus for
bio-textmining.” Bioinformatics 19(1): i180–i182.
Krauthammer, Michael, and Goran Nenadić. 2004. “Methodological
review: Term identification in the biomedical literature.” Journal of Biomedical
Informatics 37(6): 512–526.
Kwong, Oi Yee. 2018a. “Evaluating Term Extraction
Tools: System Performance vs Use Perception.” In The Human Factor in
Machine Translation, ed. by Sin-Wai Chan. Routledge.
Kwong, Oi Yee. 2018b. “Analysis and Annotation of
English-Chinese Financial Terms for Benchmarking and Language
Processing.” In Proceedings of the First Financial Narrative
Processing Workshop (FNP 2018), 10–16. Miyazaki, Japan.
Laroche, Audrey, and Philippe Langlais. 2010. “Revisiting
context-based projection methods for term-translation spotting in comparable
corpora.” In Proceedings of the 23rd International Conference on
Computational Linguistics (COLING
2010), 617–625. Beijing, China.
Lossio-Ventura, Juan Antonio, Clement Jonquet, Mathieu Roche, and Maguelonne Teisseire. 2016. “Biomedical
term extraction: overview and a new methodology.” Information Retrieval
Journal 19(1): 59–99.
Meyer, Ingrid. 1991. “Knowledge
Management for Terminology-Intensive Applications: Needs and
Tools.” In Proceedings of Workshop on Lexical Semantics and Knowledge
Representation, 20–33. Berkeley, California, USA.
QasemiZadeh, Behrang, and Anne-Kathrin Schumann. 2016. “The
ACL RD-TEC 2.0: A Language Resource for Evaluating Term Extraction and Entity Recognition
Methods.” In Proceedings of the Tenth International Conference on
Language Resources and Evaluation (LREC
2016), 1862–1868. Portorož, Slovenia.
QasemiZadeh, Behrang, and Siegfried Handschuh. 2014. “The
ACL RD-TEC: A Dataset for Benchmarking Terminology Extraction and Classification in Computational
Linguistics.” In Proceedings of the 4th International Workshop on
Computational Terminology, 52–63. Dublin, Ireland.
Quirchmayr, Thomas, Barbara Paech, Roland Kohl, Hannes Karey, and Gunar Kasdepke. 2018. “Semi-automatic
rule-based domain terminology and software feature-relevant information extraction from natural language user manuals: An
approach and evaluation at Roche Diagnostics GmbH.” Empirical Software
Engineering 231: 3630–3683.
Resnik, Philip, and I. Dan Melamed. 1997. “Semi-automatic
acquisition of domain-specific translation lexicons.” In Proceedings
of the Fifth Conference on Applied Natural Language
Processing, 340–347. Washington DC, USA.
Rigouts Terryn, Ayla, Véronique Hoste, and Els Lefever. 2019. “In
No Uncertain Terms: A Dataset for Monolingual and Multilingual Automatic Term Extraction from Comparable
Corpora.” Language Resources and
Evaluation 541: 385–418.
Rigouts Terryn, Ayla, Véronique Hoste, Patrick Drouin, and Els Lefever. 2020. “TermEval
2020: Shared Task on Automatic Term Extraction Using the Annotated Corpora for Term Extraction Research (ACTER)
Dataset.” In Proceedings of the 6th International Workshop on
Computational Terminology (COMPUTERM
2020), 85–94.
Smadja, Frank, Vasileios Hatzivassiloglou, and Kathleen McKeown. 1996. “Translating
collocations for bilingual lexicons: A statistical approach.” Computational
Linguistics 22(1): 1–38.
Sproat, Richard, and Thomas Emerson. 2003. “The
First International Chinese Word Segmentation
Bakeoff.” In Proceedings of the Second SIGHAN Workshop on Chinese
Language Processing, 133–143. Sapporo, Japan.
Voorhees, Ellen M., and Donna K. Harman (eds). 2005. TREC:
Experiment and Evaluation in Information Retrieval. Cambridge, MA: The MIT Press.
Wang, Rui, Wei Liu, and Chris McDonald. 2016. “Featureless
Domain-specific Term Extraction with Minimal Labelled
Data.” In Proceedings of Australasian Language Technology Association
Workshop, 103–112.
Warburton, Kara. 2020. “Supporting
Translators through Keyword Mining.” In Book of Abstracts of
Translation in Transition (TT5): Human and Machine Intelligence, Virtual
Conference, 34–38.
Xu, Ran, and Serge Sharoff. 2014. “Evaluating
Term Extraction Methods for Interpreters.” In Proceedings of the 4th
International Workshop on Computational
Terminology, 86–93. Dublin, Ireland.
Zaretskaya, Anna. 2017. Translators’
Requirements for Translation Technologies: User Study on Translation Tools. Doctoral
Thesis, Universidad de Málaga.
Cited by (1)
Cited by one other publication
Gašpar, Angelina, Sanja Seljan & Vlasta Kučiš
2022.
Measuring Terminology Consistency in Translated Corpora: Implementation of the Herfindahl-Hirshman Index.
Information 13:2
► pp. 43 ff.
This list is based on CrossRef data as of 5 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.