Distributed specificity for automatic terminology extraction

Amjadian, Ehsan; Inkpen, Diana; Paribakht, T. Sima; Faez, Farahnaz

doi:10.1075/term.00012.amj

Article published In:

Computational terminology and filtering of terminological information
Edited by Patrick Drouin, Natalia Grabar, Thierry Hamon, Kyo Kageura and Koichi Takeuchi
[Terminology 24:1] 2018
► pp. 23–40

Distributed specificity for automatic terminology extraction

Ehsan Amjadian | Carleton University, Canada | University of Ottawa, Canada

Diana Inkpen | University of Ottawa, Canada

T. Sima Paribakht | University of Ottawa, Canada

Farahnaz Faez | Western University, Canada

The present article explores two novel methods that integrate distributed representations with terminology extraction. Both methods assess the specificity of a word (unigram) to the target corpus by leveraging its distributed representation in the target domain as well as in the general domain. The first approach adopts this distributed specificity as a filter, and the second directly applies it to the corpus. The filter can be mounted on any other Automatic Terminology Extraction (ATE) method, allows merging any number of other ATE methods, and achieves remarkable results with minimal training. The direct approach does not perform as high as the filtering approach, but it reemphasizes that using distributed specificity as the words’ representation, very little data is required to train an ATE classifier. This encourages more minimally supervised ATE algorithms in the future.

Keywords: automatic terminology extraction, neural networks, distributed specificity, representation learning, word embeddings

Article outline

1.Introduction
2.Related work
3.Corpus
4.Methodology
- 4.1Specificity vector
- 4.2Filtering approach
- 4.3Direct approach
5.Annotation
6.Experiments and results
7.Conclusion
8.Future work
Notes
References

Published online: 31 May 2018

https://doi.org/10.1075/term.00012.amj

References (37)

Anthony, Laurence

2012 AntConc (Version 3.3.0) [Computer Software]. Tokyo, Japan: Waseda University ([URL]). Accessed 12 February 2018.

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov

2017 “Enriching Word Vectors with Subword Information.” Transactions of the Association for Computational Linguistics (TACL) 51: 135–147.

Broß, Jurgen, and Heiko Ehrig

2013 “Terminology Extraction Approaches for Product Aspect Detection in Customer Reviews.” In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, CoNLL 2013, ed. by Julia Hockenmaier and Sebastian Riedel, 222–230, Vancouver, BC, Canada.

Cabré-Castellvi, Maria Teresa, Rosa Estopa Bagot, and Jordi Vivaldi-Palatresi

2001 “Automatic Term Detection: A Review of Current Systems.” In Recent Advances in Computational Terminology, ed. by D. Bourigault, C. Jacquemin, and M. C. L’Homme, 53–87, Amsterdam/Philadephia: John Benjamins.

Chung, Teresa Mihwa

2003 “A Corpus Comparison Approach for Terminology Extraction.” Terminology 9(2): 221–246.

Chung, Teresa Mihwa, and Paul Nation

2004 “Identifying Technical Vocabulary.” System 32(2): 251–263.

Conrado, Merley, Thiago Pardo, and Solange Rezende

2013 “A Machine Learning Approach to Automatic Term Extraction Using a Rich Feature Set”. In Proceedings of the NAACL HLT 2013 Student Research Workshop, 16–23, Atlanta, GA.

Crippin, Peter, Robert Donato, and David Wright

2007 Calculus and Vectors. Toronto, ON, Canada: Nelson Education Limited.

Drouin, Patrick

2003 “Term Extraction Using Non-Technical Corpora as a Point of Leverage”. Terminology, 9(1): 99–115.

Frantzi, Katerina T., Sophia Ananiadou, and Jun-ichi Tsujii

1998 “The c-value/nc-value Method of Automatic Recognition for Multi-word Terms”. In Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries, ECDL’98, 585–604, London, UK: Springer-Verlag.

Inkpen, Diana, T. Sima Paribakht, Farahnaz Faez, and Ehsan Amjadian

2016 “Term Evaluator: A Tool for Terminology Annotation and Evaluation”. International Journal of Computational Linguistics and Applications (7) 21: 145–165.

Ismail, Azniah, and Suresh Manandhar

2010 “Bilingual Lexicon Extraction from Comparable Corpora Using in Domain Terms.” In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING ’10, 481–489, Stroudsburg, PA.

Kageura, Kyo, and Bin Umino

1996 “Methods of Automatic Term Recognition: A Review.” Terminology 3(2): 259–289.

Kirkpatrick, Chris, Barbara Alldred, Crystal Chilvers, Beverly Farahani, Kristina Farentino, Angelo Lillo, Ian Macpherson, John Rodger, and Susanne Trew

2007 Nelson Advanced Functions. Toronto, ON, Canada: Nelson Education.

Le Serrec, Annaïch, Marie-Claude L’Homme, Patrick Drouin, and Olivier Kraif

2010 “Automating the Compilation of Specialized Dictionaries Use and Analysis of Term Extraction and Lexical Alignment.” Terminology 16 (1): 77–107.

Ljubesic, Nikola, Spela Vintar, and Darja Fiser

2012 “Multi-word Term Extraction from Comparable Corpora by Combining Contextual and Constituent Clues”. In Proceedings of 5th Workshop on Building and Using Comparable Corpora (BUCC 2012), 143–147, Istanbul, Turkey.

Mikolov, Thomas, Kai Chen, Greg Corrado, and Jeffrey Dean

2013 “Efficient Estimation of Word Representations in Vector Space.” In arXiv preprint arXiv:1301.3781 ([URL]). Accessed 10 February 2018.

Mitkov, Ruslan, Richard Evans, Constantin Orasan, Iustin Dornescu, and Miguel Rios

2012 “Coreference Resolution: To What Extent Does It Help NLP Applications?”. In Text, Speech and Dialogue. TSD 2012. Lecture Notes in Computer Science, vol. 74991, 179–190. Berlin, Heidelberg: Springer.

Mnih, Andriy, and Koray Kavukcuoglu

2013 “Learning Word Embeddings Efficiently with Noise-contrastive Estimation.” In Advances in Neural Information Processing Systems, ed. by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 261: 2265–2273. Red Hook, NY, USA: Curran Associates, Inc.

Nazar, Rogelio, and Maria Teresa Cabré

2012 “A Machine Learning Approach to Automatic Term Extraction Using a Rich Feature Set.” In Proceedings of the 10th Terminology and Knowledge Engineering Conference, 209–217, Madrid, Spain.

Park, Youngja, Roy J. Byrd, and Branimir K. Boguraev

2002 “Automatic Glossary Extraction: Beyond Terminology Identification.” In Proceedings of the 19th International Conference on Computational Linguistics, 1–7, Morristown, NJ.

Pennington, Jeffrey, Richard Socher, and Christopher D. Manning

2014 “Glove: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP 2014), 1532–1543, Doha, Qatar.

Platt, John

1998 “Fast Training of Support Vector Machines using Sequential Minimal Optimization.” In Advances in Kernel Methods – Support Vector Learning, ed. by B. Schoelkopf, C. Burges, and A. Smola, 41–64, Cambridge: MIT Press.

Pontiki, Maria, Dimitris Galanis, John Pavlopoulos, Harris Papageorgiou, Ion Androutsopoulos, and Suresh Manandhar

2014 “Semeval-2014 Task 4: Aspect-based Sentiment Analysis.” In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 27–35, Dublin, Ireland.

Pontiki, Maria, Dimitris Galanis, Haris Papageorgiou, Suresh Manandhar, and Ion Androutsopoulos

2015 “Semeval-2015 Task 12: Aspect-based Sentiment Analysis.” In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), 486–495, Denver, Colorado.

Rehurek, Radim and Petr Sojka

2010 “Software Framework for Topic Modelling with Large Corpora.” In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, 45–50, Valletta, Malta.

Small, Marian, Chris Kirkpatrick, B. Alldred, S. Godin, Angelo Lillo, and Andrew Dmytriw

2007a “Functions 11”. Toronto, ON, Canada: Nelson Education Limited.

Small, Marian, Chris Kirkpatrick, and Andrew Dmytriw

2007b Functions and Applications 11. Nelson Education Limited.Small, Marian, C. Kirkpatrick, D. Zimmer, C. Chilvers, S. DAgostino, D. Duff, K. Farentino, I. Macpherson, J. Tonner, J. Williamson, and T. A. Yeager 2005 Principles of Mathematics 9. Toronto, ON, Canada; Nelson Education Limited.

Su Nam, Kim, Timothy Baldwin, and Min-Yen Kan

2009 “An Unsupervised Approach to Domain-Specific Term Extraction.” In Proceedings of the Australasian Language Technology Association Workshop 2009, 94–99, Sydney, Australia.

Turney, Peter D.

2000 ”Learning Algorithms for Keyphrase Extraction.” Information Retrieval 2(4): 303–336.

Vintar, Spela

2010 “Bilingual Term Recognition Revisited: The Bag-of-equivalents Term Alignment Approach and its Evaluation”. Terminology 16(2): 141–158.

Vu, Thuy, Ai Ti Aw, and Min Zhang

2008 “Term Extraction through Unithood and Termhood Unification.” In Proceedings of the International Joint Conference on Natural Language Processing, 631–636, Hyderabad, India.

Wang, Rui, Wei Liu, and Chris McDonald

2015 “Corpus-independent Generic Keyphrase Extraction Using Word Embedding Vectors.” In Proceedings of the Workshop on Deep Learning for Web Search and Data Mining. 1–8, Shanghai, China.

Yang, Yuhang, Hao Yu, Yao Meng, Yingliang Lu, and Yingju Xia

2010 “Fault-tolerant Learning for Term Extraction.” In Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation ( PACLIC 2010 ), ed. by Ryo Otoguro, Kiyoshi Ishikawa, Hiroshi Umemoto, Kei Yoshimoto, and Yasunari Harada, 321–330, Sendai, Japan

Yin, Yichun, Furu Wei, Li Dong, Kaimeng Xu, Ming Zhang, and Ming Zhou

2016 “Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction.” In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16). 2979–2985, New York, NY.

Yoshida, Minoru, and Hiroshi Nakagawa

2005 “Automatic Term Extraction Based on Perplexity of Compound Words” In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-05), 269–279, Jeju Island, Korea.

Zervanou, Kalliopi

2010 “The Uvt Term Extraction System in the Keyphrase Extraction Task.” In Proceedings of the 5th International Workshop on Semantic Evaluation, 194–197, Uppsala, Sweden.

Cited by (10)

Cited by 10 other publications

Order by:

Lefever, Els & Ayla Rigouts Terryn

2024. Computational Terminology. In New Advances in Translation Technology [New Frontiers in Translation Studies, ], ► pp. 141 ff.

McDonnell, Serena, Omar Nada, Nicholas Prayogo, Preston Engstrom, Muhammad Rizwan Abid, Chen Ding & Ehsan Amjadian

2022. 2022 IEEE 13th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), ► pp. 0343 ff.

Prayogo, Nicholas, Ehsan Amjadian, Serena McDonnell & Muhammad Rizwan Abid

2022. 2022 IEEE 13th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), ► pp. 0359 ff.

Amjadian, Ehsan, Nicholas Prayogo, Serena McDonnell, Cathal Smyth & Muhammad Rizwan Abid

2021. 2021 IEEE Aerospace Conference (50100), ► pp. 1 ff.

Du, Jiali, Christina Alexantris & Pingfang Yu

2021. Towards Chinese Terminology Application of TERMONLINE. In Advances in Artificial Intelligence, Software and Systems Engineering [Lecture Notes in Networks and Systems, 271], ► pp. 190 ff.

McDonnell, Serena, Omar Nada, Muhammad Rizwan Abid & Ehsan Amjadian

2021. 2021 IEEE Aerospace Conference (50100), ► pp. 1 ff.

Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever

2021. HAMLET. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 27:2 ► pp. 254 ff.

Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever

2022. Tagging terms in text. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 28:1 ► pp. 157 ff.

Isaeva, Ekaterina & Vadim Bakhtin

2020. Man - Machine Knowledge Mediation: Overview of Deep Learning Methods for Natural Language Processing. In Digital Science 2019 [Advances in Intelligent Systems and Computing, 1114], ► pp. 44 ff.

[no author supplied]

2022. Theoretical Perspectives on Terminology [Terminology and Lexicography Research and Practice, 23],

This list is based on CrossRef data as of 10 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.