Identifying lexical bundles in Chinese
Methodological issues and an exploratory data analysis
Recurrent word sequences, referred to as “lexical bundles”, may be structurally incomplete, but they serve important communicative functions. Despite the essential roles of lexical bundles in discourse, many methodological issues have been raised in the process of identifying lexical bundles, which is generally frequency-based. The present study identifies three-word and four-word bundles in Chinese conversation and news, and efforts are made to respond to methodological challenges encountered in previous studies. We employ a more sensitive dispersion measure, DP, and an internal association measure, G, which help filter out high-frequency word sequences with no identifiable function and reduce the workload of further manual interventions. An exploratory data analysis is then conducted to compare the distributional patterns of lexical bundles in Chinese conversation and news. In Chinese, both the type number and the density of lexical bundles are higher in conversation than in news. This appears to be a strong cross-linguistic tendency that reflects the real-time pressure speakers face in spontaneous speech. The exploratory data analysis also shows that the elements in Chinese bundles are closely associated with each other. This suggests that lexical bundles are useful phrasal units in Chinese discourse, and thus invites further investigations of how lexical bundles are used in Chinese.
Keywords: lexical bundle, multi-word unit, frequency, dispersion measure DP, word association measure G
Available under the Creative Commons Attribution-NonCommercial (CC BY-NC) 4.0 license.
For any use beyond this license, please contact the publisher at email@example.com.
Published online: 10 October 2018
Altenberg, Bengt & Eeg-Olofsson, Mats
Biber, Douglas & Barbieri, Federica
Biber, Douglas & Conrad, Susan & Cortes, Viviana
Biber, Douglas & Johansson, Stig & Leech, Geoffrey & Conrad, Susan & Finegan, Edward
Butler, Christopher S.
1997 Repeated word combinations in spoken and written text: Some implications for functional grammar. In Butler, Christopher S. & Connolly, John H. & Gatward, Richard A. & Vismans, Roel M. (eds.), A fund of ideas: Recent developments in functional grammar (Studies in Language and Language Use 31), 60–77. Amsterdam: IFOTT.
Carroll, John B.
Chen, Yu-Hua & Baker, Paul[ p. 546 ]
Conklin, Kathy & Schmitt, Norbert
Conrad, Susan & Biber, Douglas
Cortes, Viviana & Csomay, Eniko
Culpeper, Jonathan & Kytö, Merja
De Cock, Sylvie
Gries, Stefan Th
Institute of Information Science & CKIP Group in Academia Sinica
2013 Academia Sinica Balanced Corpus of Modern Chinese. 4th edn. (http://asbc.iis.sinica.edu.tw/) (Accessed 2016-10-04.)
Jiang, Nan & Nekrasova, Tatiana M.
Li, Charles N. & Thompson, Sandra A.[ p. 547 ]
McEnery, Tony & Xiao, Richard & Tono, Yukio
Nesi, Hilary & Basturkmen, Helen
O’Keeffe, Anne & McCarthy, Michael & Carter, Ronald
Partington, Alan & Morley, John
Pawley, Andrew & Syder, Frances Hodgetts
Simpson-Vlach, Rita & Ellis, Nick C.
Tracy-Ventura, Nicole & Cortes, Viviana & Biber, Douglas
Tremblay, Antoine & Derwing, Bruce & Libben, Gary
Wei, Naixing & Li, Jingjie
Zipf, George Kingsley[ p. 548 ]