Edited by Sonia Zyngier, Marisa Bortolussi, Anna Chesnokova and Jan Auracher
[Linguistic Approaches to Literature 5] 2008
► pp. 175–191
Computationally Discriminating Literary from Non-Literary Texts
Three computational linguistic methods are presented to discriminate literary from non-literary texts. In the first study, a hierarchical clustering technique of results obtained from Latent Semantic Analysis showed a clustering of literary versus non-literary texts. The second study used the frequencies of shared bigrams across the text, resulting in a 100% correct classification of literary versus non-literary texts. The third study used unigrams yielding a 94% correct classification into literary versus non-literary texts. The final two studies using a larger sample of texts showed that the high classification performance cannot be attributed to specific texts. These findings provide evidence that distinguishing literature from non-literature can be done with high accuracy and with relatively simple computational linguistic techniques.
Cited by 7 other publications
This list is based on CrossRef data as of 28 january 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.