Powerful variables for knowledge representation and bracketing prediction

Rojas-Garcia, Juan

doi:10.1075/ttmc.00151.roj

Article In:

Approaches to Machine Translation
Edited by Mahdieh Fakhar, Monica Vilhelm and Paz Díez-Arcón
[Translation and Translanguaging in Multilingual Contexts 11:1] 2025
► pp. 5–30

Powerful variables for knowledge representation and bracketing prediction

Juan Rojas-Garcia | University of Granada

The acquisition of knowledge is essential for specialized translation, and the representation of specialized phraseology in terminological knowledge bases facilitates this process. The aim of this study is two-fold. Firstly, it describes how the semantic annotation of the predicate-argument structure of sentences mentioning named rivers can be addressed from the perspective of Frame-based Terminology. The results show that this approach, including the semantic variables of verb lexical domain, semantic role, and semantic category, provides valuable insights into the knowledge structures underlying the usage of named rivers in specialized texts. Secondly, this study explores whether the bracketing of a three-component multiword term can be predicted from the semantic information encoded in the sentence where the ternary compound and a named river are used as arguments. The semantic variables of lexical domain, semantic role, and semantic category allowed us to construct two machine-learning models capable of accurately predicting ternary-compound bracketing.

Keywords: predicate-argument structure analysis, semantic annotation, three-component multiword term, bracketing prediction, terminological knowledge base, Frame-based Terminology, named river

Article outline

1.Introduction
2.Frame-based Terminology
3.Materials and methods
- 3.1Corpus data
- 3.2GeoNames geographic database
- 3.3Recognition of named rivers
- 3.4From multiword-term level to phrase level: Semantic annotation of predicate-argument structures for named rivers
  - 3.4.1Predicate classification in lexical domains
  - 3.4.2Semantic roles
  - 3.4.3Semantic categories
  - 3.4.4Semantic relations
  - 3.4.5Inter-annotator agreement
4.Results of the semantic annotations
- 4.1Lexical domain of action
- 4.2Construction of frames evoked by named rivers
5.Prediction of the bracketing of three-component multiword terms
- 5.1Bracketing of multiword terms
- 5.2Methods for bracketing prediction in the literature
- 5.3Semantic approach to the prediction of ternary-compound bracketing
  - 5.3.1Description of the sample of ternary compounds
  - 5.3.2Supervised models
  - 5.3.3Data splitting
  - 5.3.4Model performance measures
  - 5.3.5Construction of the supervised models
- 5.4Comparison of the results with previous research
6.Conclusions
Notes
Author queries
References

This content is being prepared for publication; it may be subject to changes.

References (32)

References

Barrière, Caroline, and Pierre A. Ménard. 2014. “Multiword Noun Compound Bracketing Using Wikipedia.” In Proceedings of the First Workshop on Computational Approaches to Compound Analysis, 72–80. Dublin: ACL. [URL].

Bergsma, Shane, Emily Pitler, and Dekang Lin. 2010. “Creating Robust Supervised Classifiers via Web-scale N-gram Data.” In Proceedings of the 48th Annual Meeting of the ACL, 865–874. Uppsala, Sweden: ACL. [URL]

Boas, Hans C. 2005. “Semantic Frames as Interlingual Representations for Multilingual Lexical Databases.” International Journal of Lexicography 18 (4): 445–478.

Buendía-Castro, Míriam, and Pamela Faber. 2016. “Phraseological Correspondence in English and Spanish Specialized Texts.” In Computerised and Corpus-based Approaches to Phraseology: Monolingual and Multilingual Perspectives, ed. by Gloria Corpas, 391–398. Geneva: Tradulex. [URL]

Faber, Pamela. 2009. “The Cognitive Shift in Terminology and Specialized Translation.” MonTI. Monografías de Traducción e Interpretación [Monographs on Translation and Interpreting] 11: 107–134.

, ed. 2012. A Cognitive Linguistics View of Terminology and Specialized Language. Berlin: De Gruyter Mouton.

Faber, Pamela, and Melania Cabezas-García. 2019. “Specialized Knowledge Representation: From Terms to Frames.” Research in Language 17 (2): 197–211.

Faber, Pamela, and Ricardo Mairal. 1999. Constructing a Lexicon of English Verbs. Berlin: Mouton de Gruyter.

Faber, Pamela, Pilar León-Araúz, and Juan A. Prieto. 2009. “Semantic Relations, Dynamicity, and Terminological Knowledge Bases.” Current Issues in Language Studies 11: 1–23. [URL]

Faruqui, Manaal, and Chris Dyer. 2015. “Non-distributional Word Vector Representations.” In Proceedings of the 53rd Annual Meeting of the ACL, 464–469. Beijing: ACL. [URL].

Fillmore, Charles J. 1968. “The Case for Case.” In Universals in Linguistic Theory, ed. by Emmon Bach, and Robert Harms, 1–89. London: Holt, Rinehart, and Winston.

Gil-Berrozpe, Juan C., Pilar León-Araúz, and Pamela Faber. 2019. “Ontological Knowledge Enhancement in EcoLexicon.” In Electronic Lexicography in the 21st Century. Proceedings of the eLex 2019 Conference, 177–197. Sintra: Lexical Computing. [URL]

Girju, Roxana, Dan Moldovan, Marta Tatu, and Daniel Antohe. 2005. “On the Semantics of Noun Compounds.” Computer Speech and Language 19 (4): 479–496.

Green, Nathan. 2011. “Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation.” In 49th Annual Meeting of the ACL, 69–74. Portland, OR: ACL. [URL]

Kim, Su Nam, and Timothy Baldwin. 2013. “A Lexical Semantic Approach to Interpreting and Bracketing English Noun Compounds.” Natural Language Engineering 19 (3): 385–407.

Klie, Jan-Christoph, Michael Bugert, Beto Boullosa, Richard Eckart de Castilho, and Iryna Gurevych. 2018. “The INCEpTION Platform: Machine-assisted and Knowledge-oriented Interactive Annotation” In Proceedings of the 27th International Conference on Computational Linguistics, 5–9. Santa Fe, NM: ACL. [URL]

Kroeger, Paul R. 2005. Analyzing Grammar: An Introduction. New York, NY: Cambridge University Press.

Lauer, Mark. 1994. “Conceptual Association for Compound Noun Analysis.” In Proceedings of the Student Session at the 32nd Annual Meeting of the ACL, 337–339. Las Cruces, NM: ACL. [URL].

. 1995. “Corpus Statistics Meet the Noun Compound: Some Empirical Results.” In Proceedings of the 3rd Annual Meeting of the ACL, 47–54. Cambridge, MA: ACL. [URL].

Lazaridou, Angeliki, Eva M. Vecchi, and Marco Baroni. 2013. “Fish Transporters and Miracle Homes: How Compositional Distributional Semantics Can Help NP Parsing.” In Proceedings of the 2013 Conference on Empirical Methods in NLP, 1908–1913. Seattle, WA: ACL. [URL]

León-Araúz, Pilar, Melania Cabezas-García, and Pamela Faber. 2021. “Multiword-term Bracketing and Representation in Terminological Knowledge Bases.” In Proceedings of the eLex 2021 Conference, 139–163. Brno: Lexical Computing. [URL]

León-Araúz, Pilar, Antonio San Martín, and Arianne Reimerink. 2018. “The EcoLexicon English Corpus as an Open Corpus in Sketch Engine.” In Proceedings of the 18th EURALEX International Congress, 893–901. Ljubljana: Ljubljana University Press. [URL]

Marcus, Mitchell P. 1980. A Theory of Syntactic Recognition for Natural Language. Cambridge, MA: The MIT Press.

Ménard, Pierre A., and Caroline Barrière. 2014. “Linked Open Data and Web Corpus Data for Noun Compound Bracketing.” In Proceedings of the 9th International Conference on Language Resources and Evaluation, 702–709. Reykjavik: ELRA. [URL]

Nakov, Preslav, and Marti Hearst. 2005. “Search Engine Statistics beyond the N-gram: Application to Noun Compound Bracketing.” In Proceedings of the 9th Conference on Computational Natural Language Learning, 17–24. Ann Arbor, MI: ACL. [URL].

Pimentel, Janine. 2015. “Using Frame Semantics to Build a Bilingual Lexical Resource on Legal Terminology.” In Handbook of Terminology, ed. by Hendrik J. Kockaert, and Frieda Steurs, 425–450. Amsterdam: John Benjamins. [URL].

Pitler, Emely, Shane Bergsma, Dekang Lin, and Kenneth Church. 2010. “Using Web-scale N-grams to Improve Base NP Parsing Performance.” In Proceedings of the 23rd International Conference on Computational Linguistics, 886–894. Beijing: ACL. [URL]

Resnik, Philip S. 1993. Selection and Information: A Class-based Approach to Lexical Relationships. PhD diss. University of Pennsylvania. IRCS Technical Reports Series 200. Philadelphia, PA: University of Pennsylvania IRCS. [URL]

Rojas-Garcia, Juan. 2022. “Semantic Representation of Context for Description of Named Rivers in a Terminological Knowledge Base.” Frontiers in Psychology 131: 847024.

Thompson, Paul, Syed A. Iqbal, John McNaught, and Sophia Ananiadou. 2009. “Construction of an Annotated Corpus to Support Biomedical Information Extraction.” BMC Bioinformatics 101: 349.

Vadas, David, and James Curran. 2007. “Large-scale Supervised Models for Noun Phrase Bracketing.” In Proceedings of the 10th Conference of the Pacific ACL, 104–112. Melbourne: ACL. [URL]

. 2008. “Parsing Noun Phrase Structure with CCG.” In Proceedings of the 46th Annual Meeting of the ACL, 335–343. Columbus, OH: ACL. [URL]