Automatic lexical collocate extraction for corpus-based ontology building and refinement
A FunGramKB case study of the THEFT conceptual scenario
Traditional corpus-based methods rely on manual inspection and extraction of lexical collocates in the study of selection preferences, which is a very costly, labor-intensive, and time-consuming task. Devising automatic methods for lexical collocate extraction becomes necessary to handle this task and the immensity of corpora available. With a view to leveraging the
Sketch Engine platform and in-built corpora, we propose a working prototype of a Lexical Collocate Extractor (LeCoExt) command-line tool that mines lexical collocates from all types of verbs according to their syntactic constituents and Collocate Frequency Score (CFS). This might be the first tool that performs comprehensive corpus-based studies of the selection preferences of individual or groups of verbs exploiting the capabilities offered by
Sketch Engine. This tool might facilitate the task of extracting rich lexico-semantic knowledge from diverse corpora in a few seconds and at a click away. We test its performance for ontology building and refinement departing from a previous detailed analysis of stealing verbs carried out by
Fernández-Martínez & Faber (2020). We show how the proposed tool is used to extract conceptual-cognitive knowledge from the THEFT scenario and implement it into FunGramKB Core Ontology through the creation and modification of theft-related conceptual units.
Article outline
-
1.Introduction
- 2.Theoretical background
- 2.1Stealing verbs: A lexico-semantic perspective
- 2.2FunGramKB: Definition, scope and architecture
- 2.3The FunGramKB ontology
- 3.Methodology
- 3.1Presenting the lexical collocate extractor (LeCoExt) tool
- 3.2FunGramKB conceptual categorization and specifications
- 4.Results and discussion
- 4.1Extraction of semantic knowledge
- 4.2Implementation and refinement of findings into the FunGramKB Core Ontology
- 4.3Limitations in this contribution
- 5.Conclusion
- Notes
-
References
References (44)
References
Asaro, C., Biasiotti, M. A., Guidotti, P., Papini, M., Sagri, M. T., & Tiscornia, D. (2003). A domain ontology: Italian crime ontology. In Proceedings of the ICAIL 2003 Workshop on Legal Ontologies & Web based legal information management, 1–7.
Berman, R. (1982). On the Nature of ‘Oblique’ Objects in Bitransitive Constructions. Lingua,
56
(2), 101–125. 

Boas, H. (2013). Frame Semantics and Translation. In A. Rojo & I. Ibarretxte-Antunano (Eds.), Cognitive Linguistics and Translation (pp. 125–158). Berlin/New York: Mouton de Gruyter. 

British National Corpus, version 3 (BNC XML Edition). (2007). Distributed by Oxford University Computing Services on behalf of the BNC Consortium. Available at [URL] [last accessed 15 May 2019]
Bušta, J., & Herman, O. (2017). JSI Newsfeed Corpus. In The 9th International Corpus Linguistics Conference, University of Birmingham, 25–28 July 2017.
Dux, R. (2018). Frames, Verbs, and Constructions: German Constructions with Verbs of Stealing. In A. Ziem & H. Boas (Eds.), Approaching German Syntax from a Constructionist Perspective (pp. 367–405). Berlin/New York: Mouton de Gruyter. 

Faber, P., & Mairal-Usón, R. (1999). Constructing a Lexicon of English Verbs. Berlin: Mouton de Gruyter. 

Faber, P., & Mairal-Usón, R. (2018). A Conceptually-Oriented Approach to Semantic Composition in RRG. In R. D. Van Valin (Ed.), The Cambridge Handbook of Role and Reference Grammar. Cambridge: Cambridge University Press.
Felices-Lago, Á. (2015). Foundational considerations for the development of the Globalcrimeterm subontology: A research project based on FunGramKB. Onomazéin,
31
(1): 127–144. 

Felices-Lago, Á. (2016). The Process of Constructing Ontological Meaning Based on Criminal Law Verbs. Círculo de Lingüística Aplicada a la Comunicación,
65
1, 109–148. 

Fillmore, C., & Baker, C. (2010). A Frames Approach to Semantic Analysis. In B. Heine & H. Narrog (Eds.), The Oxford Handbook of Linguistic Analysis (pp. 313–340). New York: Oxford University Press.
Gangemi, A., Sagri, M., & Tiscornia, D. (2005). A Constructive Framework for Legal Ontologies. In V. R. Benjamins et al. (Eds.), Law and the Semantic Web (pp. 97–124). Berlin: Springer. 

Goldberg, A. (2010). Verbs, Constructions and Semantic Frames. In M. Rappaport-Hovav, E. Doron and I. Sichel (Eds.), Syntax, Lexical Semantics and Event Structure (pp. 39–58). Oxford: Oxford University Press. 

Jakubíček, M., Kilgarriff, A., McCarthy, D., & Rychlý, P. (2010). Fast Syntactic Searching in Very Large Corpora for Many Languages. PACLIC, 741–747.
Jakubíček, M., Kilgarriff, A., Kovář, V., Rychlý, P., & Suchomel, V. (2013). The TenTen Corpus Family. Seventh International Corpus Linguistics Conference CL, 125–127.
Jiménez-Briones, R., & Luzondo-Oyón, A. (2011). Building Ontological Meaning in a Lexico-conceptual Knowledge Base. Onomázein,
23
1, 11–40.
Kilgarriff, A., Vojtěch, K., Krek, S., Srdanovič, I., & Tiberius, C. (2010). A Quantitative Evaluation of Word Sketches. Proceedings of the 14th EURALEX International Congress, 372–379.
Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P., & Suchomel, V. (2014). The Sketch Engine: Ten Years on. Lexicography,
1
1, 7–36. Available at [URL] [last accessed 28 December 2018]
Leary, R., Vandenberghe, W., & Zeleznikow, J. (2004). Towards a financial fraud ontology: a legal modelling approach, ICAIL 2003 Workshop on Legal Ontologies & Web based legal information management, 1–33.
Lenci, A. et al. (2000). SIMPLE: A general framework for the development of multilingual lexicon. International Journal of Lexicography,
13
(4), 249–263. 

Masolo, C. et al. (2003). WonderWeb Deliverable D18: Ontology Library. Laboratory for Applied Ontology, ISTC-CNR.
McCarthy, D., Kilgarrif, A., Jakubíček, M., & Reddy, S. (2015). Semantic Word Sketches. Corpus Linguistics (CL2015), 1–5.
Miller, G., & Fellbaum, C. (2007). WordNet Then and Now. Language Resources and Evaluation,
41
(2), 209–214. Available at [URL] [last accessed 17 May 2019] 
Niles, I., & Pease, A. (2001). Towards a standard Upper Ontology. In Proceedings of the Second International Conference on Formal Ontology in Information Systems. Ogunquit. Available at [URL] [last accessed 10 January 2019] 
Pedersen, B. S., & Keson, B. (1999). SIMPLE–Semantic information for multifunctional plurilingual lexica: some examples of Danish concrete nouns. Proceedings of the SIGLEX-99 Workshop. Maryland. Available at [URL] [last accessed 15 January 2019]
Periñán-Pascual, C. (2012). En defensa del procesamiento del lenguaje natural fundamentado en la lingüística teórica. Onomázein,
26
1, 13–48.
Periñán-Pascual, C. (2013). A knowledge-engineering approach to the cognitive categorization of lexical meaning. VIAL – Vigo International Journal of Applied Linguistics,
10
1, 85–104.
Periñán-Pascual, C., & Arcas-Túnez, F. (2004). Meaning postulates in a lexico-conceptual knowledge base. 15th International Workshop on Databases and Expert Systems Applications, IEEE, Los Alamitos (California), 38–42. 

Periñán-Pascual, C., & Arcas-Túnez, F. (2005). Microconceptual-Knowledge Spreading in FunGramKB. Proceedings of the 9th IASTED International Conference on Artificial Intelligence and Soft Computing. Anaheim-Calgary-Zurich: ACTA Press, 239–244.
Periñán-Pascual, C., & Arcas-Túnez, F. (2010a). The architecture of FunGramKB. Proceedings of the 7th International Conference on Language Resources and Evaluation, European Language Resources Association (ELRA), 2667–2674.
Periñán-Pascual, C., & Arcas-Túnez, F. (2010b). Ontological commitments in FunGramKB. Procesamiento del Lenguaje Natural,
44
1, 27–34.
Periñán-Pascual, C., & Mairal-Usón, R. (2009). Bringing Role and Reference Grammar to Natural Language Understanding. Procesamiento del Lenguaje Natural,
43
1, 265–273.
Periñán-Pascual, C., & Mairal-Usón, R. (2010). La gramática de COREL: un lenguaje de representación conceptual. Onomázein,
21
1, 11–45.
Periñán-Pascual, C., & Mairal-Usón, R. (2011). The COHERENT Methodology in FunGramKB. Onomázein,
24
1,13–33.
Ruiz-de-Mendoza Ibáñez, F., & Mairal-Usón, R. (2009). Constructing meaning: a brief overview of the Lexical Constructional Model. In Mario Brdar (Ed.), Converging and diverging tendencies in Cognitive Linguistics. Amsterdam/Philadelphia: John Benjamins.
Ruppenhofer, J., Boas, H., & Baker, C. (2017). FrameNet. In P. Fuertes-Olivera (Ed.), The Routledge Handbook of Lexicography (pp. 383–398). New York: Routledge. 

Rychlý, P. (2008). A Lexicographer-Friendly Association Score. Proceedings of the 2nd Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN,
2
1, 6–9.
Sartor, G., Casanovas, P., Biasotti, M. A., & Fernández-Barrera, M. (Eds.) (2011). Approaches to legal ontologies, theories, domains, methodologies, Berlin: Springer. 

Thorgren, S. (2005). Transaction Verbs: A Lexical and Semantic Analysis of Rob and Steal
. Reports from the Department of Language and Culture,
3
1, 1–44.
Valente, A. (2005). Types and roles of legal ontologies. In R. Benjamins, P. Casonovas, J. Breuker & A. Gangemi (Eds.), Law and the semantic web (pp. 65–76). Berlin: Springer. 

Van Valin, R. (2005). Exploring the Syntax-Semantics Interface. Cambridge: Cambridge University Press. 

Velardi, P., Pazienza, M., & Fasolo, M. (1991). How to Encode Semantic Knowledge: A Method for Meaning Representation and Computer-aided Acquisition. Computational Linguistics,
17
(2), 153–170.
Cited by (1)
Cited by one other publication
Idrees, Amira M. & Abdul Lateef Marzouq Al-Solami
2024.
An enrichment multi-layer Arabic text classification model based on siblings patterns extraction.
Neural Computing and Applications 36:14
► pp. 8221 ff.

This list is based on CrossRef data as of 5 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.