A corpus-based study of the automatic extraction and validation of V-N Italian oral academic
collocations
This study describes the outcomes of a POS-based method for the automatic extraction of V-N Italian oral academic collocations from an annotated corpus.
A frequency statistical measure is applied to automatically extract the collocations from the POS-tagged corpus.
The results reveal that frequency alone is not sufficient to measure the degree of association that connects the two elements of a word pair.
In order to detect the real-attested Italian collocations, the data has been further evaluated by 50 Italian native speakers.
The results indicate that these combinations are tightly linked to their context of usage.
Thus, native speakers should be exposed to these phrasal contexts to activate their mechanisms of explicit reflection and assess the degree of collocativity of these combinations.
Article outline
- Introduction
- 1.Towards a definition of “collocation”
- 1.1Collocations in applied linguistics
- 2.Data and methodology
- 2.1Collecting data for structuring the ASIC corpus
- 2.2Extracting and filtering collocations from the ASIC corpus
- 2.3Validation of the extracted academic Italian collocation list
- 2.3.1Results of the crowd sourcing experiment
- 2.3.2Double validation of the data
- Discussion and conclusions
- Acknowledgements
- Notes
-
References