Verbal collocations and pronominalisation
Precise identification of multiword expressions (MWEs) is an
important qualitative step for several NLP applications, including machine
translation. Since most MWEs cannot be translated literally, failure to
identify them yields, at best, inaccurate translation. While some
expressions are completely frozen and thus can be listed as compound words,
others display a sometimes very large degree of syntactic flexibility.
In this chapter, we argue not only that structural information is
necessary for an adequate treatment of collocations, but also that the
detection of collocations can be useful for the parser. For instance, it is
very useful for solving part-of-speech ambiguities and also some attachment
ambiguities. We therefore claim that collocation identification and parsing
are interrelated processes.
Section 2 describes the
two processes of parsing and collocation detection and their interaction,
(i) when and how the collocation identification process is triggered during
parsing, and (ii) how the identification of a collocation helps the parser.
In Section 3 we describe how
anaphora resolution has been implemented in our parsing system, to handle
cases where the antecedent and the pronoun are within the same sentence or
in adjacent sentences. Section 4
focuses on more intricate cases of verbal collocations where their nominal
element has been pronominalised, in the form of a relative pronoun or a
personal pronoun. Verb-object collocations with a relative pronoun are
extremely frequent and relatively easy to handle for a “deep” parser. In
most cases, the relative clause is directly attached to the noun which is
part of the collocation. Collocations in which the nominal element takes the
form of a personal pronoun are much harder to deal with, as they depend on
the process of anaphora resolution, a very challenging task. The last
section describes an evaluation of the collocation detection procedure,
enhanced with anaphora resolution using a corpus of newspaper articles of
about 10 million words.
Article outline
- 1.Introduction
- 2.Parsing and collocation detection
- 3.Anaphora resolution
- 4.Verbal collocations and pronominalisation
- 5.Experimental results
- 5.1Evaluation methodology
- 5.2Evaluation results
- 6.Conclusion
-
Notes
-
References
References (22)
References
Butt, M., King, T. H., Niño, M.-E., & Segond, F. (1999). A Grammar Writers Cookbook. Stanford: CSLI Publications.
Chomsky, N. (1977). On wh-movement. In P. Culicover, T. Wasow, & A. Akmajian (Eds.), Formal Syntax. Academic Press.
Chomsky, N. (1981). Lectures on Government and Binding. Foris Publications.
Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.
Grosz, B., Joshi, A., & Weinstein, S. (1995). Centering: A Framework for Modeling the Local Coherence of Discourse. Computational Linguistics, 21(2), 203–225.
Haegeman, L. (1994). Introduction to Government and Binding Theory. Blackwell
Heid, U. (2004). On the presentation of collocations in monolingual dictionaries. In Proceedings of the eleventh EURALEX International Congress Vol. II (pp. 729–738). Lorient, France.
Hobbs, J. (1978). Resolving pronoun references. Lingua, 44, 311–338.
Kibble, R. (2001). A Reformulation of Rule 2 of Centering Theory. Computational Linguistics, 27(4), 579–587.
Lappin, S., & Leass, H. (1994). An algorithm for pronominal anaphora resolution, Computational Linguistics, 20(4), 535–561.
Laurent, D. (2001). De la résolution des anaphores. Internal report. Synapse Developpement.
Mitkov, R. (2002). Anaphora Resolution. Longman.
Nerima, L., & Wehrli, E. (2015). Résolution d’anaphores appliquée aux collocations: une évaluation préliminair. In Actes de la 20e conférence sur le Traitement Automatique des Langues Naturelles (pp. 772–778). Les Sables d’Olonne, France.
Sag I. A., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D. (2002). Multiword Expressions: A Pain in the Neck for NLP. In A. Gelbukh (Ed.), CICLING02: Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing (pp. 1–15). Springer.
Seretan, V. (2011). Syntax-Based Collocation Extraction. Springer Verlag.
Seretan, V., & Wehrli, E. (2009) Multilingual collocation extraction with a syntactic parser. Language Resources and Evaluation. Special Issue on Multilingual Language Resources and Interoperability, 43(1), 71–85.
Stone, M., & Doran, C. (1996). Paying heed to collocations. In Proceedings of the Eighth International Workshop on Natural Language Generation (pp. 91–100). Herstmonceux, Sussex, England.
Wehrli, E. (2007). Fips, a “deep” linguistic multilingual parser. In Proceedings of the ACL 2007 Workshop on Deep Linguistic Processing (pp. 120–127). Prague, Czech-Republic.
Wehrli, E. (2014). The relevance of collocations for parsing. In Proceedings of the Workshop on Multiword Expressions (pp. 26–32). Gothenburg: EACL.
Wehrli, E., Seretan, V., & Nerima, L. (2010). Sentence analysis and collocation identification. In Proceedings of the Workshop on Multiword Expressions: from Theory to Application (pp. 27–35). Beijing: Coling.
Wehrli, E., & Nerima, L. (2015). The Fips multilingual parser. In N. Gala, R. Rapp, & G. Bel-Enguix (Eds.), Language Production, Cognition, and the Lexicon. Series Text, Speech and Language Technology, 48 (pp. 473–489). Springer.
Wilks, Y. (1975). Preference Semantics. In E. Keenan (Ed.), The Formal Semantics of Natural Language (pp. 329–350). Cambridge, UK: Cambridge University Press.
Cited by (1)
Cited by one other publication
Wehrli, Eric
2022.
Collocations in Parsing and Translation.
Frontiers in Artificial Intelligence 5
This list is based on CrossRef data as of 29 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.