Exploring Newspaper Language

Using the web to create and investigate a large corpus of modern Norwegian

Edited by Gisle Andersen
Norwegian School of Economics, Bergen
This book describes new methodological and technological approaches to corpus building and presents recent research based on the Norwegian Newspaper Corpus. This is a large monitor corpus of contemporary Norwegian language, compiled through daily harvesting of web newspapers. The book gives an overview of the corpus and its system architecture, and presents tools used for tasks such as text harvesting, annotation, topic classification and extraction and frequency profiling of new words and phrases. Among the innovative technologies is Corpuscle, a corpus query engine and management system which is flexible enough to handle very large corpora in an efficient way. The individual research contributions based on the corpus explore different aspects of Norwegian, including the occurrence of anglicisms, neologisms and terminology, and the use of metonymy and metaphor in newspaper language. The book also describes an innovative method of applying correspondence analysis and implicational analysis to investigate interdependencies between morphosyntactic variants.
[Studies in Corpus Linguistics, 49]  2012.  vi, 356 pp.
Publishing status: Available
HardboundAvailable
ISBN 9789027203540 | EUR 99.00 | USD 149.00
 
e-BookSold by e-book platforms
ISBN 9789027274991 | EUR 99.00 | USD 149.00
 
 

Table of Contents

Building a large corpus based on newspapers from the web
Gisle Andersen and Knut Hofland
1–28
Part I. Exploiting the web as a corpus – Methods and tools
Corpuscle – a new corpus management platform for annotated corpora
Paul Meurer
29–50
OBT+stat: A combined rule-based and statistical tagger
Janne Bondi Johannessen, Kristin Hagen, André Lynum and Anders Nøklestad
51–66
Exploring corpora through syntactic annotation
Victoria Rosén
67–78
Collocations and statistical analysis of n-grams: Multiword expressions in newspaper text
Gunn Inger Lyse and Gisle Andersen
79–110
Automatic topic classification of a large newspaper corpus
Thomas M. Hagen
111–130
A data-driven approach to anglicism identification in Norwegian
Gyri Smørdal Losnegaard and Gunn Inger Lyse
131–154
Part II. Corpus-based case studies
A corpus-based study of the adaptation of English import words in Norwegian
Gisle Andersen
155–192
Norm clusters in written Norwegian
Helge Dyvik
193–220
Lexical neography in modern Norwegian
Ruth Vatvedt Fjeld and Lars Nygaard
221–240
Ash compound frenzy: A case study in the Norwegian Newspaper Corpus
Koenraad De Smedt
241–256
Financial jargon in a general newspaper corpus
Marita Kristiansen
257–284
Metonymic extension and vagueness: Schengen and Kyoto in Norwegian newspaper language
Sandra L. Halverson
285–306
Spatial metaphors in present-day Norwegian newspaper language
Leiv Egil Breivik and Toril Swan
307–330
Doing historical linguistics using contemporary data
Øivin Andersen
331–350
351–352
Subject index
353–356

Subjects

Benjamins Subject classification

BIC Subject

CFX: Computational linguistics

BISAC Subject

LAN009000: LANGUAGE ARTS & DISCIPLINES / Linguistics
U.S. Library of Congress Control Number:  2011045662
This page is part of John Benjamins Publishing Company website. Click 'embed' to view its contents in the fully-featured web application. Embed