Computationally Discriminating Literary from Non-Literary Texts

Louwerse, Max M.; Benesh, Nick; Zhang, Bin

doi:10.1075/lal.5.16lou

Part of

Directions in Empirical Literary Studies: In honor of Willie van Peer
Edited by Sonia Zyngier, Marisa Bortolussi, Anna Chesnokova and Jan Auracher
[Linguistic Approaches to Literature 5] 2008
► pp. 175–191

Computationally Discriminating Literary from Non-Literary Texts

Max M. Louwerse

Nick Benesh

Bin Zhang

Three computational linguistic methods are presented to discriminate literary from non-literary texts. In the first study, a hierarchical clustering technique of results obtained from Latent Semantic Analysis showed a clustering of literary versus non-literary texts. The second study used the frequencies of shared bigrams across the text, resulting in a 100% correct classification of literary versus non-literary texts. The third study used unigrams yielding a 94% correct classification into literary versus non-literary texts. The final two studies using a larger sample of texts showed that the high classification performance cannot be attributed to specific texts. These findings provide evidence that distinguishing literature from non-literature can be done with high accuracy and with relatively simple computational linguistic techniques.

Keywords: bigram analysis, classification techniques, computational linguistics, genre, latent semantic analysis, stylistics

Published online: 15 May 2008

https://doi.org/10.1075/lal.5.16lou

Cited by

Cited by 7 other publications

Order by:

Berthelier, Benoit

2022. Quantifying Quality: A Computational Approach to Literary Value in North Korea. The Journal of Asian Studies 81:2 ► pp. 267 ff.

Gavaler, Chris & Dan Johnson

2019. The literary genre effect. Scientific Study of Literature 9:1 ► pp. 34 ff.

Guy, Josephine M, Kathy Conklin & Jennifer Sanchez-Davies

2018. Literary stylistics, authorial intention and the scientific study of literature: A critical overview. Language and Literature: International Journal of Stylistics 27:3 ► pp. 196 ff.

Mar, Raymond A.

2018. Evaluating whether stories can promote social cognition: Introducing the Social Processes and Content Entrained by Narrative (SPaCEN) framework. Discourse Processes 55:5-6 ► pp. 454 ff.

McCarthy, Kathryn S.

2015. Reading beyond the lines. Scientific Study of Literature 5:1 ► pp. 99 ff.

Mohseni, Mahdi, Volker Gast & Christoph Redies

2021. Fractality and Variability in Canonical and Non-Canonical English Fiction and in Non-Fictional Texts. Frontiers in Psychology 12

van Cranenburgh, Andreas, Karina van Dalen-Oskam & Joris van Zundert

2019. Vector space explorations of literary language. Language Resources and Evaluation 53:4 ► pp. 625 ff.

This list is based on CrossRef data as of 10 april 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.