Part of
Corpora and Rhetorically Informed Text Analysis: The diverse applications of DocuScopeEdited by David West Brown and Danielle Zawodny Wetzel
[Studies in Corpus Linguistics 109] 2023
► pp. 167–189
This chapter proposes a novel method that deploys non-negative matrix factorization to extract topic models from texts. This topic modeling process reveals how terms and DocuScope Language Action Type Analysis (LATs) align, providing robust information on what texts are about and how they are organized rhetorically. Moreover, the non-negative nature of the topics means that each derived topic can be viewed as a sum of topical features, which can greatly ease the interpretive process. To elucidate and benchmark this method, I apply it to a well-known 20 Newsgroups dataset and sample the results.