Identifying lexical bundles in Chinese
Methodological issues and an exploratory data analysis
Recurrent word sequences, referred to as “lexical bundles”, may be structurally incomplete, but they serve important communicative functions. Despite the essential roles of lexical bundles in discourse, many methodological issues have been raised in the process of identifying lexical bundles, which is generally frequency-based. The present study identifies three-word and four-word bundles in Chinese conversation and news, and efforts are made to respond to methodological challenges encountered in previous studies. We employ a more sensitive dispersion measure, DP, and an internal association measure, G, which help filter out high-frequency word sequences with no identifiable function and reduce the workload of further manual interventions. An exploratory data analysis is then conducted to compare the distributional patterns of lexical bundles in Chinese conversation and news. In Chinese, both the type number and the density of lexical bundles are higher in conversation than in news. This appears to be a strong cross-linguistic tendency that reflects the real-time pressure speakers face in spontaneous speech. The exploratory data analysis also shows that the elements in Chinese bundles are closely associated with each other. This suggests that lexical bundles are useful phrasal units in Chinese discourse, and thus invites further investigations of how lexical bundles are used in Chinese.
- 2.Methodological issues in identifying lexical bundles
- 2.1Issues relating to the corpus
- 2.2Issues relating to the length of lexical bundles
- 2.3Issues relating to the quantitative criteria
- 2.4Issues relating to manual interventions
- 2.5An interim summary
- 3.Identifying lexical bundles in Chinese
- 3.1Extracting high-frequency word sequences
- 3.2Dispersion thresholds
- 3.3Association threshold
- 3.4Other methodological issues and practical solutions
- 4.Results and discussion
For any use beyond this license, please contact the publisher at firstname.lastname@example.org.