Vol. 49:1 (2023) ► pp.96–139
Incorporating structural topic modeling into short text analysis
The past few decades have seen the rapid development of topic modeling. So far, research has been more concerned with determining the ideal number of topics or meaningful topic clustering words than with applying topic modeling techniques to evaluate linguistic theories. This study proposes the Structural Topic Model (STM)-led framework to facilitate the interpretation of topic modeling results and standardize text analysis. STM encompasses various model training mechanisms, thereby requiring systematic designs to properly combine language studies. “Structural” in STM refers to the inclusion of metadata structure. Unlike the corpus-based keyness approach, STM can capture contextual cues and meta-information for the interpretation of topical results. Besides, STM can make cross-corpora comparisons via topical contrast, a challenging task for corpus-driven related models such as the Biterm Topic Model (BTM). Stylistic variations in song lyrics are taken as an illustration to show how to use the suggested framework to delve into the linguistic theory proposed by Pennebaker (2013). The topical model and iterable model in the proposed paradigm can clarify how pronouns affect style distinction. We believe the proposed STM-led framework can shed light on text analysis by conducting a reproducible cross-corpora comparison on short texts.
Article outline
- 1.Introduction
- 2.Literature review
- 2.1Lyrics and linguistics
- 2.2Corpus-based approaches
- 2.3Corpus-driven approaches
- 2.3.1Topic modeling
- 2.3.2Model evaluation
- 3.A proposed STM-led analytics framework
- 4.Lyrics analytics as a case study
- 4.1Data pre-processing and exploration
- 4.2Iterable assessment
- 4.2.1Linguistic supervision of model selection
- 4.2.2Topical quality model (TQ model)
- 4.2.3Iterable assessment model (IA model)
- 4.3Generalization
- 5.Conclusion
- Acknowledgements
- Notes
-
References
For any use beyond this license, please contact the publisher at [email protected].
https://doi.org/10.1075/consl.22026.wan