Publications

Publication details [#19631]

Publication type
Article in jnl/bk
Publication language
English

Abstract

Over the last 50 years, dictionary publishers and linguists have created a number of corpora, starting from 1-million-word corpora rising to billion-word corpora. The implicit claim of the corpus compilers is that the texts they have selected are in some sense representative – representative perhaps of a large number of language users or representative in the sense of a standard. In this contribution, the composition of four major corpora is discussed, indicating that there is a measure of objectivity in the selection of many text samples. Given that the pioneers of corpus linguistics were interested in the teaching of English as a second language, one can discern an emphasis on informative texts – texts used in science and technology, at the expense of literary texts, in the compilation of the corpora. However, the author points out that there are instances where texts were published by a small group of publishers, based mainly in metropolitan areas, or where there is a gender imbalance between the authors of the texts. Ultimately, he argues, there is a degree of choice exercised by the compilers, but in this respect the behaviour of corpus linguists is not that different from that of much of the scientific community.
Source : G. Anderman & M. Rogers