Exploring Newspaper Language

Using the web to create and investigate a large corpus of modern Norwegian

| Norwegian School of Economics, Bergen
ISBN 9789027203540 | EUR 99.00 | USD 149.00
ISBN 9789027274991 | EUR 99.00 | USD 149.00
This book describes new methodological and technological approaches to corpus building and presents recent research based on the Norwegian Newspaper Corpus. This is a large monitor corpus of contemporary Norwegian language, compiled through daily harvesting of web newspapers. The book gives an overview of the corpus and its system architecture, and presents tools used for tasks such as text harvesting, annotation, topic classification and extraction and frequency profiling of new words and phrases. Among the innovative technologies is Corpuscle, a corpus query engine and management system which is flexible enough to handle very large corpora in an efficient way. The individual research contributions based on the corpus explore different aspects of Norwegian, including the occurrence of anglicisms, neologisms and terminology, and the use of metonymy and metaphor in newspaper language. The book also describes an innovative method of applying correspondence analysis and implicational analysis to investigate interdependencies between morphosyntactic variants.
[Studies in Corpus Linguistics, 49]  2012.  vi, 356 pp.
Publishing status: Available
Table of Contents
Building a large corpus based on newspapers from the web
Gisle Andersen and Knut Hofland
Part I. Exploiting the web as a corpus – Methods and tools
Corpuscle – a new corpus management platform for annotated corpora
Paul Meurer
OBT+stat: A combined rule-based and statistical tagger
Janne Bondi Johannessen †, Kristin Hagen, André Lynum and Anders Nøklestad
Exploring corpora through syntactic annotation
Victoria Rosén
Collocations and statistical analysis of n-grams: Multiword expressions in newspaper text
Gunn Inger Lyse and Gisle Andersen
Automatic topic classification of a large newspaper corpus
Thomas M. Hagen
A data-driven approach to anglicism identification in Norwegian
Gyri Smørdal Losnegaard and Gunn Inger Lyse
Part II. Corpus-based case studies
A corpus-based study of the adaptation of English import words in Norwegian
Gisle Andersen
Norm clusters in written Norwegian
Helge Dyvik
Lexical neography in modern Norwegian
Ruth Vatvedt Fjeld and Lars Nygaard
Ash compound frenzy: A case study in the Norwegian Newspaper Corpus
Koenraad De Smedt
Financial jargon in a general newspaper corpus
Marita Kristiansen
Metonymic extension and vagueness: Schengen and Kyoto in Norwegian newspaper language
Sandra L. Halverson
Spatial metaphors in present-day Norwegian newspaper language
Leiv Egil Breivik and Toril Swan
Doing historical linguistics using contemporary data
Øivin Andersen
Subject index
Cited by

Cited by 6 other publications

Abdumanapovna, Sharipova Aziza
2018.  In Proceedings of the 2nd International Conference on Digital Technology in Education - ICDTE 2018,  pp. 82 ff. Crossref logo
Akundi, Aditya & Oscar Mondragon
2021. Model based systems engineering—A text mining based structured comprehensive overview. Systems Engineering Crossref logo
Andersen, Gisle
2021. Utilising heterogeneous language resources for term extraction in maritime domains. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication Crossref logo
Andersen, Gisle & Anne-Line Graedler
2020. Morphological borrowing from English to Norwegian: The enigmatic non-possessive -s. Nordic Journal of Linguistics 43:1  pp. 3 ff. Crossref logo
Andersen, Gisle & Daniel Hardt
2014. Introduction: Corpus linguistics and the Nordic languages. Nordic Journal of Linguistics 37:2  pp. 135 ff. Crossref logo
Gisle, Andersen
2020. Phraseology in a cross-linguistic perspective: A diachronic and corpus-based account. Corpus Linguistics and Linguistic Theory 0:0 Crossref logo

This list is based on CrossRef data as of 16 november 2021. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.

Subjects & Metadata
BIC Subject: CFX – Computational linguistics
BISAC Subject: LAN009000 – LANGUAGE ARTS & DISCIPLINES / Linguistics / General
ONIX Metadata
ONIX 2.1
ONIX 3.0
U.S. Library of Congress Control Number:  2011045662 | Marc record