Challenges in Corpus Linguistics
Rethinking corpus compilation and analysis
This book contributes to the discussion of challenges faced in different areas of corpus linguistics, namely the compilation, annotation, and analysis of linguistic corpora. In a field of growing corpus sizes and expanding possibilities of gathering data, some old issues persist, while at the same time new problems have emerged. As the compilation and study of language corpora gets increasingly sophisticated and complex, continuous attention on ways of dealing with the data in question and challenges in text selection and interpretation is needed. The contributions to this volume address problems relating to a variety of areas in corpus linguistic study, including corpus annotation, data variability, learner language, social media texts, and database utilization. The authors provide critical overviews and research-based analyses, discuss the nature of some of the common pitfalls, and offer solutions to existing problems.
[Studies in Corpus Linguistics, 118] 2024. vii, 172 pp.
Publishing status: Available
Published online on 6 September 2024
Published online on 6 September 2024
© John Benjamins
Table of Contents
-
Acknowledgements | pp. vii–viii
-
From fallacies and pitfalls to solutions and future directions: Navigating the evolving terrain of corpus linguisticsMark Kaunisto | pp. 1–8
-
Engaging with bad (meta)data in historical corpus linguisticsTuro Vartiainen and Tanja Säily | pp. 9–34
-
Named entities as potentially problematic items in corporaMark Kaunisto | pp. 35–54
-
Challenges in the compilation, annotation, and analysis of learner corpus dataMarcus Callies | pp. 55–67
-
Early newspapers as data for corpus linguistics (and Digital Humanities): Issues in using the British Library Newspapers database as a corpusTuro Hiltunen | pp. 68–88
-
Open Corpus Linguistics – or How to overcome common problems in dealing with corpus data by adopting open research practicesStefan Hartmann | pp. 89–105
-
Text length and short texts: An overview of the problemAatu Liimatta | pp. 106–125
-
Corpus genre categories: Issues at the intersection of linguistics and literatureDaniel Ocic Ihrmark | pp. 126–141
-
Modeling fine-grained sociolinguistic variation: The promises and pitfalls of Twitter corpora and neural word embeddingsFilip Miletić, Anne Przewozny-Desriaux and Ludovic Tanguy | pp. 142–170
-
Subject index | pp. 171–172
Subjects
Main BIC Subject
CFX: Computational linguistics
Main BISAC Subject
LAN009000: LANGUAGE ARTS & DISCIPLINES / Linguistics / General