Article published in:Terminology across Languages and Domains
Edited by Patrick Drouin, Natalia Grabar, Thierry Hamon and Kyo Kageura
[Terminology 21:2] 2015
► pp. 180–204
Nested term recognition driven by word connection strength
Domain corpora are often not very voluminous and even important terms can occur in them not as isolated maximal phrases but only within more complex constructions. Appropriate recognition of nested terms can thus influence the content of the extracted candidate term list and its order. We propose a new method for identifying nested terms based on a combination of two aspects: grammatical correctness and normalised pointwise mutual information (NPMI) counted for all bigrams in a given corpus. NPMI is typically used for recognition of strong word connections, but in our solution we use it to recognise the weakest points to suggest the best place for division of a phrase into two parts. By creating, at most, two nested phrases in each step, we introduce a binary term structure. We test the impact of the proposed method applied, together with the C-value ranking method, to the automatic term recognition task performed on three corpora, two in Polish and one in English.
Keywords: pointwise mutual information, nested phrase recognition, C-value, automatic term extraction, domain corpora
Published online: 31 December 2015
Cited by 1 other publications
Du, Jiali, Christina Alexantris & Pingfang Yu
This list is based on CrossRef data as of 24 april 2021. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
Barrón-Cedeno, Alberto, Gerardo Sierra, Patrick Drouin, and Sophia Ananiadou
2009 “Normalized (Pointwise) Mutual Information in Collocation.” In From Form to Meaning: Processing Texts Automatically, Proceedings of the Biennial GSCL Conference 2009 , ed. by Christian Chiarcos, Richard Eckart de Castilho and Manfred Stede, 31–40. Tubingen: Gunter Narr Verlag.
Frantzi, Katerina, Sophia Ananiadou, and Hideki Mima
http://www.nactem.ac.uk/genia/). Accessed 20 August 2015.
Kageura, Kyo, and Bin Umino
Kim, Jin-Dong, Tomoko Otha, Yuka Tateisi, and Jun’ichi Tsujii
2012 “Mining Class Association Rule for Word Sense Disambigiation.” In Security and Intelligent Information Systems. Lecture Notes in Computer Science Volume 7053, ed. by Pascal Bouvry, Mieczysław A. Kłopotek, Franck Leprévost, Małgorzata Marciniak, Agnieszka Mykowiecka, and Henryk Rybiński, 307–317. Berlin Heidelberg: Springer.
Korkontzelos, Ioannis, Ioannis P. Klapaftis, and Suresh Manandhar
Lossio-Ventura, Juan Antonio, Clement Jonquet, Mathieu Roche, and Maguelonne Teisseire
Manning, Christopher D., and Hinrich Schutze
Marciniak, Małgorzata, and Agnieszka Mykowiecka
2013 “Terminology Extraction from Domain Texts in Polish.” In Intelligent Tools for Building a Scientific Information Platform. Advanced Architectures and Solutions. volume 467 of Studies in Computational Intelligence, ed. by Robert Bembenik, Łukasz Skonieczny, Henryk Rybiński, Marzena Kryszkiewicz, and Marek Niezgódka, 171–185. Berlin Heidelberg: Springer.
Nenadic, Goran, Irena Spasic, and Sophia Ananiadou
Pantel, Patrick, and Dekang Lin
Pazienza, Maria T., Marco Pennacchiotti, and Fabio M. Zanzotto
http://zil.ipipan.waw.pl/plWikiEcono). Accessed 20 August 2015.
Sclano, Francesco, and Paola Velardi
Tateisi, Yuka, and Jun’ichi Tsujii
Toutanova, Kristina, Dan Klein, Christopher D. Manning, and Yoram Singer
2003 “Feature-rich Part-of-speech Tagging with a Cyclic Dependency Network.” In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1, ed. by Marti Hearstand and Mari Ostendorf, 173–180. Edmonton, Canada: ACL.
Ventura, Juan A. Lossio, Clement Jonquet, Mathieu Roche, and Maguelonne Teisseire
Vu, Thuy, Ai Ti Aw, and Min Zhang
Wermter, Joachim, and Udo Hahn