Investigating Wikipedia
Linguistic corpus building, exploration and analysis
The present volume is intended as a reference book on Wikipedia corpus studies, from corpus construction to exploration and analysis. Wikipedia is a complex object, difficult to manipulate for linguists and corpus researchers. In addition to the encyclopedic articles consulted by millions of users, it contains vast spaces of written discussions, aka talk pages, where Wikipedia authors negotiate the collaborative editing of articles, make evaluations, or discuss related topics. The proposed volume covers Wikipedia articles, their revision histories, and discussions, with a focus on discussions, which have not been studied extensively so far and have also been neglected in previous corpus building efforts. Wikipedia discussions are instances of computer-mediated communication (CMC), thus constituting a completely different, interaction-oriented linguistic genre. Sophisticated tools and methods of linguistic annotation and corpus exploration are needed to exploit the huge and valuable corpus resources that can be constructed from the Wikipedia discussions. The present volume aims at encouraging and facilitating Wikipedia corpus studies, providing standards, recommendations, and innovative methods to build and explore Wikipedia corpora, and presenting corpus studies that make the most of the peculiarities of Wikipedia.
[Studies in Corpus Linguistics, 121] 2024. vi, 264 pp.
Publishing status: Available
Published online on 25 October 2024
Published online on 25 October 2024
© John Benjamins
Table of Contents
-
IntroductionCéline Poudat, Harald Lüngen, Laura Herzberg and Lydia-Mai Ho-Dac | pp. 1–10
-
Part I. Building Wikipedia corpora
-
Chapter 1. Building a comparable corpus of online discussions on Wikipedia: The EFG WikiCorpusLydia-Mai Ho-Dac | pp. 12–44
-
Chapter 2. Mining parallel corpora from WikipediaOlivier Kraif | pp. 45–74
-
Part II. Interactions on Wikipedia talk pages
-
Chapter 3. Exploring interactions in Wikipedia talk pagesLudovic Tanguy, Céline Poudat and Lydia-Mai Ho-Dac | pp. 76–106
-
Chapter 4. Investigating reply relations on Wikipedia talk pages to reconstruct interactional strategies of Wikipedia authorsLaura Herzberg and Harald Lüngen | pp. 107–133
-
Chapter 5. Sockpuppets, Wikifants, and Honeypots: Metaphorical patterns in digital discourses on Wikipedia talk pagesEva Gredel | pp. 134–154
-
Part III. Visualizing and exploring cooperation and conflicts in Wikipedia
-
Chapter 6. Exploring the evolution of Wikipedia articles through ContropediaDavid Laniado, Michele Mauri and Erik Borra | pp. 156–177
-
Chapter 7. Live exploration of Wikipedia editing dynamics with visual analytics: WhoColor and Interactive Wikipedia Article Analysis NotebooksFabian Flöck, Roberto Ulloa and Maribel Acosta | pp. 178–204
-
Chapter 8. Disagreements and conflicts in Wikipedia talk pagesCéline Poudat and Marie Chandelier | pp. 205–234
-
Chapter 9. To each their own truth: Epistemic regimes on Wikipedia talk pagesGuillaume Carbou, Lydia-Mai Ho-Dac, Céline Poudat and Gilles Sahut | pp. 235–262
-
Index | pp. 263–264
Subjects
Main BIC Subject
CFX: Computational linguistics
Main BISAC Subject
LAN009000: LANGUAGE ARTS & DISCIPLINES / Linguistics / General