Collecting collocations from general and specialised corpora
A comparative analysis
Collocations are increasingly taken into account in general and
specialised repositories and methodologies to collect them are heavily based
on corpora. However, lexicographers and terminologists use different kinds
of corpora in which combinations are likely to behave according to specific
rules and/or patterns. This contribution presents a comparative analysis of
the collocational behaviour of 15 lexical items found in a general language
corpus and a specialised corpus on the theme of the environment. We
automatically extracted large sets of collocates (three lists of 50
collocates) for each lexical item and from each corpus and analyse different
facets of collocational behaviour: polysemy of lexical items,
characteristics of collocates (overlap, rank and semantic classes of
collocates, etc.). Our aim is to draw the attention of terminologists and
lexicographers to some specific factors affecting the behaviour of
collocations in specialized and general corpora.
Article outline
- 1.Introduction
- 2.Lexical combinations in terminology and lexicography
- 3.A comparative analysis
- 3.1Corpora
- 3.2Lexical items selected
- 3.3Automated extraction of collocations
- 4.Observations on the lists of candidate collocations
- 4.1Overlap of candidate collocates
- 4.2Rank of candidates
- 4.3How collocates reveal specific meanings of items
- 5.Concluding remarks: Summary and guidelines for terminologists and
lexicographers
-
Acknowledgements
-
Notes
-
Funding
-
References
References (18)
References
Azoulay, D. (2017). Frame-Based Knowledge Representation Using Large Specialized Corpora. In Proceedings of the AAAI Spring Symposium on Computational Construction Grammar and Natural Language Understanding. Stanford University, CA.
Binon, J., Verlinde, S., Van Dyck, J., & Bertels, A. (2000). Dictionnaire d’apprentissage du français des affaires. Dictionnaire de compréhension et de production de la langue des affaires. Paris: Didier.
Buendía, M., & Faber, P. (2014). Collocation dictionaries: a comparative analysis. MonTi: Monografías de Traducción e Interpretación, 6, 203–235.
Cohen, B. (1986). Lexique de cooccurrents. Bourse–conjuncture économique. Montréal: Linguatech.
DiCoInfo. Dictionnaire fondamental de l’informatique et de l’Internet. (2016). [URL].
Evert, S. (2004). The Statistics of Word Cooccurrences. Word Pairs and Collocations. (Thesis presented at the University of Stuttgart, Germany).
Evert, S. (2008). Corpora and collocations. In A. Ludeling, & M. Kytö (Eds.), Corpus Linguistics. An International Handbook. Berlin: Mouton de Gruyter.
Haussmann, F. J. (1979). Un dictionnaire des collocations est-il possible? Travaux de linguistique et de littérature, 17(1), 187–195.
Iordanskaja, L., & Mel’cuk, I. (2017). Le mot dans le lexique et le mot dans la phrase. Paris: Hermann.
Mel’čuk, I. (1996). Lexical Functions: A Tool for the Description of Lexical Relations in the Lexicon. In L. Wanner (Ed.), Lexical Functions in Lexicography and Language Processing (pp. 37–102). Amsterdam/Philadelphia: Benjamins. Merriam-Webster Dictionary. (2016). [URL]
Moon, R. (2015). Multiword Items. In J. Taylor (Ed.), Handbook of the Word (pp. 121–140). Oxford: Oxford University Press.
Merriam-Webster Dictionary. 2016. ([URL]).
Pecina, P. (2009). Lexical Association Measures and Collocation Extraction. Language Resources and Evaluation, 44(1–2), 137-158.
Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the International Conference on New Methods in Language Processing (pp. 44–49). Manchester, UK.
Cited by (1)
Cited by one other publication
Giacomini, Laura
2022.
The contextual behaviour of specialised collocations: typology and lexicographic treatment.
Yearbook of Phraseology 13:1
► pp. 55 ff.
This list is based on CrossRef data as of 29 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.