Article published in:
Exploring Newspaper Language: Using the web to create and investigate a large corpus of modern NorwegianEdited by Gisle Andersen
[Studies in Corpus Linguistics 49] 2012
► pp. 79–110
Collocations and statistical analysis of n-grams
Multiword expressions in newspaper text
Gunn Inger Lyse | University of Bergen
Gisle Andersen | NHH Norwegian School of Economics
Multiword expressions (MWEs) are words that co-occur so often that they are perceived as a linguistic unit. Since MWEs pervade natural language, their identification is pertinent for a range of tasks within lexicography, terminology and language technology. We apply various statistical association measures (AMs) to word sequences from the Norwegian Newspaper Corpus (NNC) in order to rank two-and three-word sequences (bigrams and trigrams) in terms of their tendency to co-occur. The results show that some statistical measures favour relatively frequent MWEs (e.g. i motsetning til ‘as opposed to’), whereas other measures favour relatively low-frequent units, which typically comprise loan words (de facto), technical terms (notaries publicus) and phrasal anglicisms (practical jokes; cf. G. Andersen this volume). On this basis we evaluate the relevance of each of these measures for lexicography, terminology and language technology purposes.
Published online: 23 March 2012
https://doi.org/10.1075/scl.49.05lys
https://doi.org/10.1075/scl.49.05lys
Cited by
Cited by 5 other publications
Akundi, Aditya & Oscar Mondragon
Andersen, Gisle
Dione, Cheikh Bamba & Christer Johansson
Maisto, Alessandro
Vállez, Mari, Rafael Pedraza-Jiménez, Lluís Codina, Saúl Blanco & Cristòfol Rovira
This list is based on CrossRef data as of 17 may 2022. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.