This study aims to develop a new computing method for extracting contiguous phraseological sequences (PSs) of various lengths from academic text corpora by measuring internal associations of n-grams. We construct a new normalizing algorithm of probability-weighted average for refining the MI measure and enhancing precision in extracting PSs from corpora. This computing method is applied to the data in a medium-sized text corpus of academic English. Results indicate that the resultant new MI measure can provide statistics which better reveal internal associations within an n-gram, regardless of size. Lexico-grammatical sequences extracted with this method are more complete and less arbitrary in terms of grammar and semantics. The method can be applied to treating a variety of linguistic phenomenon, ranging from well-established phrases to likely phrasal entities, thus having potentially practical applications in corpus-based studies of phraseology and natural language processing.
2019. Assessing Phraseological Development in Word Sequences of Variable Lengths in Second Language Texts Using Directional Association Measures. Language Learning 69:2 ► pp. 440 ff.
2017. Computational learning of construction grammars. Language and Cognition 9:2 ► pp. 254 ff.
García Salido, Marcos, Marcos Garcia & Margarita Alonso-Ramos
2019. Identifying Lexical Bundles for an Academic Writing Assistant in Spanish. In Computational and Corpus-Based Phraseology [Lecture Notes in Computer Science, 11755], ► pp. 144 ff.
2024. Chunk Extraction in Business English Correspondences. In An MT-Oriented Study of Corresponding Lexical Chunks in Business Correspondences from English to Chinese, ► pp. 37 ff.
Jeaco, Stephen
2017. Helping Language Learners Put Concordance Data in Context. International Journal of Computer-Assisted Language Learning and Teaching 7:2 ► pp. 22 ff.
Jeaco, Stephen
2020. Helping Language Learners Put Concordance Data in Context. In Language Learning and Literacy, ► pp. 71 ff.
Polio, Charlene & Hyung-Jo Yoon
2020. Exploring Multi-Word Combinations as Measures of Linguistic Accuracy in Second Language Writing. In Learner Corpus Research Meets Second Language Acquisition, ► pp. 96 ff.
Yoon, Hyung-Jo
2016. Association strength of verb-noun combinations in experienced NS and less experienced NNS writing: Longitudinal and cross-sectional findings. Journal of Second Language Writing 34 ► pp. 42 ff.
Zhou, Qihong & Li Mou
2024. A Corpus-Based Study of Lexical Chunks in Chinese Academic Discourse: Extraction, Classification, and Application. In Chinese Lexical Semantics [Lecture Notes in Computer Science, 14515], ► pp. 257 ff.
This list is based on CrossRef data as of 31 march 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.