23. Combined statistical and grammatical criteria for the retrieval of phraseological units in an electronic corpus
The aim of this study is to refine and optimise the mainly statistical and distributional approach to the automatic extraction of phraseological units (PUs) from text corpora, by introducing minimal linguistic elements (lemmatisation and grammatical tagging). These operations were first tested using the same corpora as in our previous research (Pamies & Pazos 2003 & 2004). This provided us with a new set of results, which we compared with the previous ones.We found that the detection ability had improved substantially, especially when dealing with verb + noun and verb + adjective collocations. This methodology was then applied to a larger corpus. Again, the results were encouraging, with phraseological densities up to 64.5% for the verb + noun category.
Cited by (1)
Cited by one other publication
Buerki, Andreas
2020.
Formulaic Language and Linguistic Change,
This list is based on CrossRef data as of 23 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.