The present article explores two novel methods that integrate distributed representations with terminology extraction. Both methods assess the specificity of a word (unigram) to the target corpus by leveraging its distributed representation in the target domain as well as in the general domain. The first approach adopts this distributed specificity as a filter, and the second directly applies it to the corpus. The filter can be mounted on any other Automatic Terminology Extraction (ATE) method, allows merging any number of other ATE methods, and achieves remarkable results with minimal training. The direct approach does not perform as high as the filtering approach, but it reemphasizes that using distributed specificity as the words’ representation, very little data is required to train an ATE classifier. This encourages more minimally supervised ATE algorithms in the future.
2012AntConc (Version 3.3.0) [Computer Software]. Tokyo, Japan: Waseda University ([URL]). Accessed 12 February 2018.
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov
2017 “Enriching Word Vectors with Subword Information.” Transactions of the Association for Computational Linguistics (TACL) 51: 135–147.
Broß, Jurgen, and Heiko Ehrig
2013 “Terminology Extraction Approaches for Product Aspect Detection in Customer Reviews.” In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, CoNLL 2013, ed. by Julia Hockenmaier and Sebastian Riedel, 222–230, Vancouver, BC, Canada.
Cabré-Castellvi, Maria Teresa, Rosa Estopa Bagot, and Jordi Vivaldi-Palatresi
2004 “Identifying Technical Vocabulary.” System 32(2): 251–263.
Conrado, Merley, Thiago Pardo, and Solange Rezende
2013 “A Machine Learning Approach to Automatic Term Extraction Using a Rich Feature Set”. In Proceedings of the NAACL HLT 2013 Student Research Workshop, 16–23, Atlanta, GA.
Crippin, Peter, Robert Donato, and David Wright
2007Calculus and Vectors. Toronto, ON, Canada: Nelson Education Limited.
Frantzi, Katerina T., Sophia Ananiadou, and Jun-ichi Tsujii
1998 “The c-value/nc-value Method of Automatic Recognition for Multi-word Terms”. In Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries, ECDL’98, 585–604, London, UK: Springer-Verlag.
Inkpen, Diana, T. Sima Paribakht, Farahnaz Faez, and Ehsan Amjadian
2016 “Term Evaluator: A Tool for Terminology Annotation and Evaluation”. International Journal of Computational Linguistics and Applications (7) 21: 145–165.
Ismail, Azniah, and Suresh Manandhar
2010 “Bilingual Lexicon Extraction from Comparable Corpora Using in Domain Terms.” In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING ’10, 481–489, Stroudsburg, PA.
Kirkpatrick, Chris, Barbara Alldred, Crystal Chilvers, Beverly Farahani, Kristina Farentino, Angelo Lillo, Ian Macpherson, John Rodger, and Susanne Trew
2007Nelson Advanced Functions. Toronto, ON, Canada: Nelson Education.
Le Serrec, Annaïch, Marie-Claude L’Homme, Patrick Drouin, and Olivier Kraif
2012 “Multi-word Term Extraction from Comparable Corpora by Combining Contextual and Constituent Clues”. In Proceedings of 5th Workshop on Building and Using Comparable Corpora (BUCC 2012), 143–147, Istanbul, Turkey.
Mikolov, Thomas, Kai Chen, Greg Corrado, and Jeffrey Dean
2013 “Efficient Estimation of Word Representations in Vector Space.” In arXiv preprint arXiv:1301.3781 ([URL]). Accessed 10 February 2018.
Mitkov, Ruslan, Richard Evans, Constantin Orasan, Iustin Dornescu, and Miguel Rios
2012 “Coreference Resolution: To What Extent Does It Help NLP Applications?”. In Text, Speech and Dialogue. TSD 2012. Lecture Notes in Computer Science, vol. 74991, 179–190. Berlin, Heidelberg: Springer.
Mnih, Andriy, and Koray Kavukcuoglu
2013 “Learning Word Embeddings Efficiently with Noise-contrastive Estimation.” In Advances in Neural Information Processing Systems, ed. by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 261: 2265–2273. Red Hook, NY, USA: Curran Associates, Inc.
Nazar, Rogelio, and Maria Teresa Cabré
2012 “A Machine Learning Approach to Automatic Term Extraction Using a Rich Feature Set.” In Proceedings of the 10th Terminology and Knowledge Engineering Conference, 209–217, Madrid, Spain.
Park, Youngja, Roy J. Byrd, and Branimir K. Boguraev
2002 “Automatic Glossary Extraction: Beyond Terminology Identification.” In Proceedings of the 19th International Conference on Computational Linguistics, 1–7, Morristown, NJ.
Pennington, Jeffrey, Richard Socher, and Christopher D. Manning
2014 “Glove: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP 2014), 1532–1543, Doha, Qatar.
Platt, John
1998 “Fast Training of Support Vector Machines using Sequential Minimal Optimization.” In Advances in Kernel Methods – Support Vector Learning, ed. by B. Schoelkopf, C. Burges, and A. Smola, 41–64, Cambridge: MIT Press.
Pontiki, Maria, Dimitris Galanis, John Pavlopoulos, Harris Papageorgiou, Ion Androutsopoulos, and Suresh Manandhar
2014 “Semeval-2014 Task 4: Aspect-based Sentiment Analysis.” In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 27–35, Dublin, Ireland.
Pontiki, Maria, Dimitris Galanis, Haris Papageorgiou, Suresh Manandhar, and Ion Androutsopoulos
2015 “Semeval-2015 Task 12: Aspect-based Sentiment Analysis.” In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), 486–495, Denver, Colorado.
Rehurek, Radim and Petr Sojka
2010 “Software Framework for Topic Modelling with Large Corpora.” In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, 45–50, Valletta, Malta.
Small, Marian, Chris Kirkpatrick, B. Alldred, S. Godin, Angelo Lillo, and Andrew Dmytriw
2007a “Functions 11”. Toronto, ON, Canada: Nelson Education Limited.
Small, Marian, Chris Kirkpatrick, and Andrew Dmytriw
2007bFunctions and Applications 11. Nelson Education Limited.Small, Marian, C. Kirkpatrick, D. Zimmer, C. Chilvers, S. DAgostino, D. Duff, K. Farentino, I. Macpherson, J. Tonner, J. Williamson, and T. A. Yeager 2005 Principles of Mathematics 9. Toronto, ON, Canada; Nelson Education Limited.
Su Nam, Kim, Timothy Baldwin, and Min-Yen Kan
2009 “An Unsupervised Approach to Domain-Specific Term Extraction.” In Proceedings of the Australasian Language Technology Association Workshop 2009, 94–99, Sydney, Australia.
Turney, Peter D.
2000 ”Learning Algorithms for Keyphrase Extraction.” Information Retrieval 2(4): 303–336.
2008 “Term Extraction through Unithood and Termhood Unification.” In Proceedings of the International Joint Conference on Natural Language Processing, 631–636, Hyderabad, India.
Wang, Rui, Wei Liu, and Chris McDonald
2015 “Corpus-independent Generic Keyphrase Extraction Using Word Embedding Vectors.” In Proceedings of the Workshop on Deep Learning for Web Search and Data Mining. 1–8, Shanghai, China.
Yang, Yuhang, Hao Yu, Yao Meng, Yingliang Lu, and Yingju Xia
2010 “Fault-tolerant Learning for Term Extraction.” In Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation (PACLIC 2010), ed. by Ryo Otoguro, Kiyoshi Ishikawa, Hiroshi Umemoto, Kei Yoshimoto, and Yasunari Harada, 321–330, Sendai, Japan
Yin, Yichun, Furu Wei, Li Dong, Kaimeng Xu, Ming Zhang, and Ming Zhou
2016 “Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction.” In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16). 2979–2985, New York, NY.
Yoshida, Minoru, and Hiroshi Nakagawa
2005 “Automatic Term Extraction Based on Perplexity of Compound Words” In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-05), 269–279, Jeju Island, Korea.
Zervanou, Kalliopi
2010 “The Uvt Term Extraction System in the Keyphrase Extraction Task.” In Proceedings of the 5th International Workshop on Semantic Evaluation, 194–197, Uppsala, Sweden.
Cited by
Cited by 10 other publications
Amjadian, Ehsan, Nicholas Prayogo, Serena McDonnell, Cathal Smyth & Muhammad Rizwan Abid
2021. 2021 IEEE Aerospace Conference (50100), ► pp. 1 ff.
Du, Jiali, Christina Alexantris & Pingfang Yu
2021. Towards Chinese Terminology Application of TERMONLINE. In Advances in Artificial Intelligence, Software and Systems Engineering [Lecture Notes in Networks and Systems, 271], ► pp. 190 ff.
Isaeva, Ekaterina & Vadim Bakhtin
2020. Man - Machine Knowledge Mediation: Overview of Deep Learning Methods for Natural Language Processing. In Digital Science 2019 [Advances in Intelligent Systems and Computing, 1114], ► pp. 44 ff.
McDonnell, Serena, Omar Nada, Muhammad Rizwan Abid & Ehsan Amjadian
2021. 2021 IEEE Aerospace Conference (50100), ► pp. 1 ff.
McDonnell, Serena, Omar Nada, Nicholas Prayogo, Preston Engstrom, Muhammad Rizwan Abid, Chen Ding & Ehsan Amjadian
2022. 2022 IEEE 13th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), ► pp. 0343 ff.
Prayogo, Nicholas, Ehsan Amjadian, Serena McDonnell & Muhammad Rizwan Abid
2022. 2022 IEEE 13th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), ► pp. 0359 ff.
Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever
2021. HAMLET. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 27:2 ► pp. 254 ff.
Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever
2022. Tagging terms in text. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 28:1 ► pp. 157 ff.
Shan, Bingzhao, Muhammad Rizwan Abid & Ehsan Amjadian
2020. Proceedings of the 2020 the 4th International Conference on Information System and Data Mining, ► pp. 100 ff.
This list is based on CrossRef data as of 9 september 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.