• Forthcoming titles
      • New in paperback
      • New titles by subject
      • January 2021
      • December 2020
      • November 2020
      • October 2020
      • New serials
      • Latest issues
      • Currently in production
      • Active series
      • Other series
      • Open-access books
      • Text books & Course books
      • Dictionaries & Reference
      • By JB editor
      • Active serials
      • Other
      • By JB editor
      • Printed catalogs
      • E-book collections
      • Amsterdam (Main office)
      • Philadelphia (North American office)
      • General
      • US, Canada & Mexico
      • E-books
      • Examination & Desk Copies
      • General information
      • Access to the electronic edition
      • Special offers
      • Terms of Use
      • E-newsletter
      • Book Gazette
Article published in:
Exploring Newspaper Language: Using the web to create and investigate a large corpus of modern Norwegian
Edited by Gisle Andersen
[Studies in Corpus Linguistics 49] 2012
► pp. 111–130

Automatic topic classification of a large newspaper corpus

Thomas M. Hagen | Uni Computing
This paper describes how machine learning methods, along with a limited amount of manual classification, were used in a fairly successful attempt to automatically classify, by topic, newspaper articles from The Norwegian Newspaper Corpus.The Norwegian Newspaper Corpus is a challenge to automatically classify, since not all of it has been boilerplate cleaned. Our automatic topic classifier achieved 54% accuracy, while our human annotators achieved only slightly more; 59% accuracy.We used several machine learning methods built into the programming language CRM114. We used used six annotators, who manually classied 1,400 articles from the corpus. The machine learner trained only on errors it made when attempting to classify these articles.
Published online: 23 March 2012
https://doi.org/10.1075/scl.49.06hag
Share via FacebookShare via TwitterShare via LinkedInShare via WhatsApp
About us | Disclaimer | Privacy policy | | | | Antiquariathttps://benjamins.com