Verbal collocations and pronominalisation

Wehrli, Eric; Seretan, Violeta; Nerima, Luka

doi:10.1075/ivitra.24.11weh

Part of

Computational Phraseology
Edited by Gloria Corpas Pastor and Jean-Pierre Colson
[IVITRA Research in Linguistics and Literature 24] 2020
► pp. 207–224

Verbal collocations and pronominalisation

Eric Wehrli | University of Geneva

Violeta Seretan | University of Geneva

Luka Nerima | University of Geneva

Precise identification of multiword expressions (MWEs) is an important qualitative step for several NLP applications, including machine translation. Since most MWEs cannot be translated literally, failure to identify them yields, at best, inaccurate translation. While some expressions are completely frozen and thus can be listed as compound words, others display a sometimes very large degree of syntactic flexibility.

In this chapter, we argue not only that structural information is necessary for an adequate treatment of collocations, but also that the detection of collocations can be useful for the parser. For instance, it is very useful for solving part-of-speech ambiguities and also some attachment ambiguities. We therefore claim that collocation identification and parsing are interrelated processes.

Section 2 describes the two processes of parsing and collocation detection and their interaction, (i) when and how the collocation identification process is triggered during parsing, and (ii) how the identification of a collocation helps the parser. In Section 3 we describe how anaphora resolution has been implemented in our parsing system, to handle cases where the antecedent and the pronoun are within the same sentence or in adjacent sentences. Section 4 focuses on more intricate cases of verbal collocations where their nominal element has been pronominalised, in the form of a relative pronoun or a personal pronoun. Verb-object collocations with a relative pronoun are extremely frequent and relatively easy to handle for a “deep” parser. In most cases, the relative clause is directly attached to the noun which is part of the collocation. Collocations in which the nominal element takes the form of a personal pronoun are much harder to deal with, as they depend on the process of anaphora resolution, a very challenging task. The last section describes an evaluation of the collocation detection procedure, enhanced with anaphora resolution using a corpus of newspaper articles of about 10 million words.

Keywords: collocation, multiword expressions, anaphora resolution, pronominalisation, deep parsing

Article outline

1.Introduction
2.Parsing and collocation detection
3.Anaphora resolution
4.Verbal collocations and pronominalisation
5.Experimental results
- 5.1Evaluation methodology
- 5.2Evaluation results
6.Conclusion
Notes
References

Published online: 8 May 2020

https://doi.org/10.1075/ivitra.24.11weh

References (22)

References

Butt, M., King, T. H., Niño, M.-E., & Segond, F. (1999). A Grammar Writers Cookbook. Stanford: CSLI Publications.

Chomsky, N. (1977). On wh-movement. In P. Culicover, T. Wasow, & A. Akmajian (Eds.), Formal Syntax. Academic Press.

(1981). Lectures on Government and Binding. Foris Publications.

Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.

Grosz, B., Joshi, A., & Weinstein, S. (1995). Centering: A Framework for Modeling the Local Coherence of Discourse. Computational Linguistics, 21(2), 203–225.

Haegeman, L. (1994). Introduction to Government and Binding Theory. Blackwell

Heid, U. (2004). On the presentation of collocations in monolingual dictionaries. In Proceedings of the eleventh EURALEX International Congress Vol. II (pp. 729–738). Lorient, France.

Hobbs, J. (1978). Resolving pronoun references. Lingua, 44, 311–338.

Kibble, R. (2001). A Reformulation of Rule 2 of Centering Theory. Computational Linguistics, 27(4), 579–587.

Lappin, S., & Leass, H. (1994). An algorithm for pronominal anaphora resolution, Computational Linguistics, 20(4), 535–561.

Laurent, D. (2001). De la résolution des anaphores. Internal report. Synapse Developpement.

Mitkov, R. (2002). Anaphora Resolution. Longman.

Nerima, L., & Wehrli, E. (2015). Résolution d’anaphores appliquée aux collocations: une évaluation préliminair. In Actes de la 20e conférence sur le Traitement Automatique des Langues Naturelles (pp. 772–778). Les Sables d’Olonne, France.

Sag I. A., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D. (2002). Multiword Expressions: A Pain in the Neck for NLP. In A. Gelbukh (Ed.), CICLING02: Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing (pp. 1–15). Springer.

Seretan, V. (2011). Syntax-Based Collocation Extraction. Springer Verlag.

Seretan, V., & Wehrli, E. (2009) Multilingual collocation extraction with a syntactic parser. Language Resources and Evaluation. Special Issue on Multilingual Language Resources and Interoperability, 43(1), 71–85.

Stone, M., & Doran, C. (1996). Paying heed to collocations. In Proceedings of the Eighth International Workshop on Natural Language Generation (pp. 91–100). Herstmonceux, Sussex, England.

Wehrli, E. (2007). Fips, a “deep” linguistic multilingual parser. In Proceedings of the ACL 2007 Workshop on Deep Linguistic Processing (pp. 120–127). Prague, Czech-Republic.

(2014). The relevance of collocations for parsing. In Proceedings of the Workshop on Multiword Expressions (pp. 26–32). Gothenburg: EACL.

Wehrli, E., Seretan, V., & Nerima, L. (2010). Sentence analysis and collocation identification. In Proceedings of the Workshop on Multiword Expressions: from Theory to Application (pp. 27–35). Beijing: Coling.

Wehrli, E., & Nerima, L. (2015). The Fips multilingual parser. In N. Gala, R. Rapp, & G. Bel-Enguix (Eds.), Language Production, Cognition, and the Lexicon. Series Text, Speech and Language Technology, 48 (pp. 473–489). Springer.

Wilks, Y. (1975). Preference Semantics. In E. Keenan (Ed.), The Formal Semantics of Natural Language (pp. 329–350). Cambridge, UK: Cambridge University Press.

Cited by (1)

Cited by one other publication

Wehrli, Eric

2022. Collocations in Parsing and Translation. Frontiers in Artificial Intelligence 5

This list is based on CrossRef data as of 29 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.