Framing karstology
From definitions to knowledge structures and automatic frame population
We describe the creation of a knowledge base in the field of karstology using the frame-based approach. Apart from
providing a new multilingual resource using manually annotated definitions as the source of structured information, the main focus
is on exploring text mining methods to identify targeted knowledge structures in specialised corpora. The first stage of this
process is the design of a domain model and its implementation in a definition annotation task. Once annotation is completed, an
analysis of typical co-occurrence patterns between semantic categories and the relations describing them allows us to discern
ideal definition templates. We demonstrate that such templates contribute to a more comprehensive and structured representations
of concepts, but also help us design targeted text mining experiments to retrieve new semantic relations from text. Two such
experiments are presented, the first using intersections of word embeddings to identify words expressing a specific semantic
relation, and the second using the embedding of the semantic relation to extract multiword units which contain the target
relation. Results suggest that the proposed methods are promising for capturing the semantic properties of relations in
frame-based knowledge modelling.
Article outline
- 1.Introduction
- 2.Related work
- 3.Building the TermFrame knowledge base
- 3.1The domain model
- 3.2Extracting definitions
- 3.3Multi-layered semantic annotation
- 3.4Eliciting frames
- 4.Towards automatic frame population
- 4.1Extracting relations using intersections of word embeddings
- 4.2Targeted hyponym extraction
- Conclusions
- Acknowledgements
- Notes
-
References
References (65)
References
Altmanova, Jana, Claudio Grimaldi, and Silvia Domenica Zollo. 2018. “Le
rôle des adjectifs dans la catégorisation des déchets”. In F. Neveu, B. Harmegnies, L. Hriba et S. Prévost (Eds.), SHS
Web Conferences 46, 6ème Congrès Mondial de Linguistique
Française. Université de Mons, Belgique: 1–15.
Bernier-Colborne, Gabriel, and Marie-Claude L’Homme. 2015. “Using
a Distributional Neighbourhood Graph to Enrich Semantic Frames in the Field of the
Environment.” Proceedings of the conference Terminology and Artificial Intelligence
(TIA2015).
Bertoldi, Anderson, and Rove Luiza de Oliveira Chishman. 2007. “Improving
Legal Ontologies through Semantic Representation of Adjectives”. ICSC
2007: 767–774.
Bodenreider, O., & Pakhomov, S. 2003. “Exploring
Adjectival Modification in Biomedical Discourse across Two
Genres”. In Proceedings of the ACL 2003 workshop on natural language
processing in biomedicine: 105–112.
Bögli, Alfred. 1980. Karst
Hydrology and Physical Speleology. Berlin Heidelberg New York: Springer-Verlag.
Bojanowski, Piotr, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. “Enriching word vectors with subword information.” Transactions of the Association for Computational Linguistics 51: 135–146.
Cabezas-García, Melania, and Pilar León-Araúz. 2018. “Towards
the Inference of Semantic Relations in Complex Nominals: A Pilot
Study”. In Proceedings of the Eleventh International Conference on
Language Resources and Evaluation (LREC
2018): 2511–2518.
Campos, Araceli Alonso, and Sergi Torner Castells. 2010. “Adjectives
and collocations in specialized texts: lexicographical
implications.” In Proceedings of the XIV Euralex International
Congress, ed. by Anne Dykstra, and Tanneke Schoonheim, pp. 872–881.
De Castilho, Eckart Richard, Chris Biemann, Irina Gurevych, and S. M. Yimam. 2014. “WebAnno:
a Flexible, Web-based Annotation Tool for CLARIN”. In Proceedings of
the CLARIN Annual Conference (CAC) 2014, Soesterberg, Netherlands.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. “Bert:
Pre-training of Deep Bidirectional Transformers for Language Understanding”. arXiv preprint
arXiv:1810.04805.
Diaz, Fernando, Bhaskar Mitra, and Nick Craswell. 2016. “Query
Expansion with Locally-Trained Word Embeddings”. arXiv preprint
arXiv:1605.07891.
Durán-Muñoz, Isabel. 2019. “Adjectives
and Their Keyness: A Corpus-based Analysis of Tourism Discourse in
English.” Corpora 14.31: 351–378.
Faber, Pamela. 2009. “The
Cognitive Shift in Terminology and Specialized Translation”. MonTI. Monografías de Traducción e
Interpretación 11: 107–134.
Faber, Pamela, Pilar León-Araúz, and Arianne Reimerink. 2011. “Knowledge
Representation in EcoLexicon.” Technological innovation in the teaching and processing of LSPs:
proceedings of
TISLID 101 (2011): 367–386.
Faber, Pamela, ed. 2012. A
Cognitive Linguistics View of Terminology and Specialized
Language. Berlin/Boston: De Gruyter Mouton.
Faber, Pamela, and Pilar León-Araúz. 2014. “Specialized knowledge dynamics.” In Dynamics and Terminology: An Interdisciplinary Perspective on Monolingual and Multilingual Culture-bound Communication, ed. by Temmerman, R., and M. Van Campenhoudt (2014): 135–158.
Faber, Pamela, Pilar León-Araúz and Arianne Reimerink. 2016. “EcoLexicon:
New Features and
Challenges.” GLOBALEX: 73–80.
Fader, Anthony, Stephen Soderland, and Oren Etzioni. 2011. “Identifying
Relations for Open Information Extraction.” In Proceedings of the
2011 conference on empirical methods in natural language
processing: 1535–1545.
Field, Malcolm S. 2002. A Lexicon Of Cave And Karst Terminology
With Special Reference To Environmental Karst Hydrology. US Environmental Protection Agency.
Fillmore, Charles J. 1976. “Frame Semantics and the Nature
of Language.” Origins and Evolution of Language and Speech. (Annals of the New York Academy of
Sciences 280). Ed. New York Academy of Sciences: 20–32.
Ford, Derek and Williams, Paul. 2007. Karst
Hydrogeology and Geomorphology. Wiley, Chichester.
Fernández-Reyes, Francis. C., Jorge Hermosillo-Valadez, and Manuel Montes-y-Gómez. 2018. “A
Prospect-guided Global Query Expansion Strategy using Word Embeddings”. Information Processing
and
Management, 54(1): 1–13.
Gabor, Kata, Davide Buscaldi, Anne-Kathrin Schumann, Behrang Qasemi Zadeh, Haifa Zargayouna, and Thierry Charnois. 2018. “SemEval-2018
Task 7: Semantic Relation Extraction and Classification in Scientific
Papers”. In Proceedings of The 12th International Workshop on
Semantic Evaluation: 679–688.
Gams, Ivan, Jurij Kunaver and Darko Radinja. 1973. Slovenska
kraška terminologija. Ljubljana: Katedra za fizično geografijo, Univerza v Ljubljani.
Gil-Berrozpe, Juan Carlos, Pilar León-Araúz, and Pamela Faber. 2017. “Specifying
Hyponymy Subtypes and Knowledge Patterns: A Corpus-based
Study.” In Proceedings of the Fifth International Conference on
Electronic Lexicography in the 21st Century (eLex
2017): 19–21. 2017.
Gillieson, David. 1996. Caves.
Processes, development and management. Cambridge, Massachusetts: Blackwell Publishers.
Glossary and Multilingual Equivalents of Karst
Terms. 1972. Paris: UNESCO.
Gunn, John. 2004. Encyclopedia
of Caves and Karst Science. New York, London: Fitzroy Dearborn.
Ittoo, Ashwin, and Gosse Bouma. 2010. “On
Learning Subtypes of the Part-whole Relation: Do not Mix your
Seeds.” In Proceedings of the 48th Annual Meeting of the Association
for Computational Linguistics: 1328–1336.
Jennings, Joseph Newell. 1997. Cave and Karst
Terminology. Australian Speleological Federation.
Juršič, Matjaž, Igor Mozetič, Tomaž Erjavec, and Nada Lavrač. 2010. “Lemmagen:
Multilingual Lemmatisation with Induced Ripple-down Rules”. Journal of Universal Computer
Science,
16
(9): 1190–1214.
Lafourcade, Mathieu, and Lionel Ramadier. 2016. “Semantic
Relation Extraction with Semantic Patterns: Experiment on Radiology
Report”. In LREC: Language Resources and Evaluation
Conference. ELRA: 4578–4582.
L’Homme, Marie-Claude. 2002. “What
can Verbs and Adjectives Tell us about Terms?” In Proceedings of
Terminology and Knowledge Engineering (TKE 2002), Nancy, France.
Liu, Qian, Heyan Huang, Junyu Xuan, Guangquan Zhang, Yang Gao, and Jie Lu. 2020. “A
Fuzzy Word Similarity Measure for Selecting Top-k Similar Words in Query Expansion”. IEEE
Transactions on Fuzzy Systems.
Liu, Yinhan, Ott, Myle, Goyal, Naman, Du, Jingfei, Joshi, Mandar, Chen, Danqi, Levy, Omer, Lewis, Mike, Zettlemoyer, Luke and Stoyanov, Veselin. 2019. “RoBERTa:
A Robustly Optimized BERT Pretraining Approach”. arXiv preprint
arXiv:1907.11692.
Lowe, David, Waltham, Tony. 2002. Dictionary
of Karst and Caves: A Brief Guide to the Terminology and Concepts of Cave and Karst
Science. British Cave Research Association.
Mikolov, Tomas, Chen, Kai, Corrado, Greg, and Dean, Jeffrey. 2013. “Efficient
Estimation of Word Representations in Vector Space”. arXiv preprint
arXiv:1301.3781.
Miljković, Dragana, Tjaša Stare, Igor Mozetič, Vid Podpečan, Marko Petek, Kamil Witek, Marina Dermastia, Nada Lavrač, and Kristina Gruden. 2012. “Signalling
Network Construction for Modelling Plant Defence Response.” PloS one 7, no. 12 (2012):
e51822.
Monroe, Watson H. 1970. A Glossary of Karst
Terminology. Washington D.C.: U.S. Geological Survey.
Navigli, Roberto, and Paula Velardi. 2010. “Learning
Word-class Lattices for Definition and Hypernym
Extraction”. In Proceedings of the 48th annual meeting of the
association for computational
linguistics: 1318–1327.
Pavlopoulos, Kosmas, Niki Evelpidou, and Andreas Vassilopoulos. 2009. Mapping
Geomorphological Environments. Springer, Berlin Heidelberg.
Pollak, Senja, Anže Vavpetič, Janez Kranjc, Nada Lavrač, and Špela Vintar. 2012. “NLP
Workflow for On-line Definition Extraction from English and Slovene Text
Corpora”. In Proceedings of
KONVENS 2012: 53–60.
Pollak, Senja, Andraž Repar, Matej Martinc, and Vid Podpečan. 2019. “Karst
Exploration: Extracting Terms and Definitions from Karst Domain Corpus”. In Proceedings of
eLex 2019: 934–956.
Pollak, Senja, Vid Podpečan, Dragana Miljković, Uroš Stepišnik, and Špela Vintar. 2020. “The
NetViz Terminology Visualization Tool and the Use Cases in Karstology Domain
Modeling”. Marseille: The International Workshop on Computational Terminology COMPUTERM 2020 at LREC 2020: 55–60.
San Martín, Antonio, Catherine Trekker, and Pilar León-Araúz. 2020. “Extraction
of Hyponymic Relations in French with Knowledge-Pattern-Based Word
Sketches”. In Proceedings of the 12th Conference on Language
Resources and Evaluation (LREC 2020): 5953–5961.
Silva, Alfredo, and Mendoza, Marcelo. 2020. “Improving
Query Expansion Strategies with Word Embeddings”. In Proceedings of
the ACM Symposium on Document Engineering 2020: 1–4.
Šušteršič, France, and Martin Knez. 1995. “Prispevek
k slovenskemu speleološkemu pojmovniku”. Naše
jame 371: 153–170.
Ulčar, Matej, and Marko Robnik-Šikonja. 2020a. “Slovenian
RoBERTa Contextual Embeddings Model: SloBERTa 1.0”. Slovenian language resource repository
CLARIN.SI. Available at: [URL] (2. 7.
2021).
Ulčar, Matej, and Marko Robnik-Šikonja. 2020b. „FinEst
BERT and CroSloEngual BERT”. In International Conference on Text,
Speech, and Dialogue. Springer, Cham.: 104–111.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention
is All you Need.” In Advances in neural information processing
systems: 5998–6008.
Vintar, Špela, and Larisa Grčić Simeunović. 2017. “Definition
Frames as Language-dependent Models of Knowledge
Transfer”. Fachsprache 1–21/2017: 43–58.
Vintar, Špela, Amanda Saksida, Katarina Vrtovec, and Uroš Stepišnik. 2019. “Modelling
Specialized Knowledge with Conceptual Frames: The TermFrame approach to a structured visual domain
representation.” In Proceedings of eLex
2019: 305–318.
Vintar, Špela, Larisa Grčić Simeunović, Matej Martinc, Senja Pollak, and Uroš Stepišnik. 2020. “Mining
Semantic Relations from Comparable Corpora through Intersections of Word
Embeddings”. In Proceedings of the 13th Workshop on Building and
Using Comparable Corpora: 29–34.
Vintar, Špela, and Uroš Stepišnik. 2021. “TermFrame:
A Systematic Approach to Karst
Terminology”. Dela 541/2021: 149–167.
Vintar, Špela, Vid Podpečan, and Vid Ribič. 2021. “Frame-based
Terminography: a Multi-modal Knowledge Base for
Karstology”. In Proceedings of eLex
2021: 164–176.
Vrtovec, Katarina, Špela Vintar, Amanda Saksida, and Uroš Stepišnik. 2019. “TermFrame :
Knowledge Frames in Karstology”. In Proceedings of
TOTh2019: 109–126.
Vulić, Ivan, Edoardo Maria Ponti, Robert Litschko, Goran Glavaš, and Anna Korhonen. 2020. “Probing
Pretrained Language Models for Lexical Semantics.” arXiv preprint
arXiv:2010.05731.
Yin, Wenpeng, and Dan Roth. 2018. “Term
Definitions Help Hypernymy Detection.” arXiv preprint
arXiv:1806.04532 (2018).
Cited by (1)
Cited by one other publication
Chambó, Santiago & Pilar León Araúz
This list is based on CrossRef data as of 5 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.