Corpus analysis

Jan Aarts
Table of contents

Nowadays, when linguists speak of a corpus, they usually mean a collection of computer-readable texts. The design of the collection as well as the nature of the texts may vary considerably from one corpus to another, but the texts, whether spoken or written, must have been produced in an actual context of language use. The utterances constituting the texts are never artificial linguistic objects produced under laboratory conditions for the sole purpose of linguistic research. The fact that corpora are computationally accessible and that they are repositories of language use, largely determines the nature of the linguistic research they are used for. First, corpus analysis nowadays cannot be carried out without the availability of advanced computational tools; secondly, it is naturally oriented towards the study of language use and therefore biased towards the study of specific languages, genres and language varieties.

Full-text access is restricted to subscribers. Log in to obtain additional credentials. For subscription information see Subscription & Price.


Aarts, J
1992Comments on ICE. In J. Svartvik (ed.): 180–183.Google Scholar
Aarts, J., P. De Haan & N. Oostdijk
(eds.) 1993English language corpora. Rodopi.Google Scholar
Black, E., R. Garside & G. Leech
(eds.) 1993Statistically-driven computer grammars of English. Rodopi.Google Scholar
Burnage, G. & D. Dunlop
1993Encoding the British National Corpus. In J. Aarts, P. De Haan & N. Oostdijk (eds.): 79–95.Google Scholar
Collot, M. & N. Belmore
1993Electronic language. In J. Aarts, P. De Haan & N. Oostdijk (eds.): 41–55.Google Scholar
Granger, S
1993International Corpus of Learner English. In J. Aarts, P. De Haan & N. Oostdijk (eds.): 57–69.Google Scholar
Greenbaum, S
1992A new corpus of English: ICE. In Svartvik (ed.): 171–179. DOI logoGoogle Scholar
Harris, Z
1951Methods in structural linguistics. University of Chicago Press.Google Scholar
Johansson, S
1980The LOB corpus of British English texts: presentation and comments. ALLC Journal 1: 25–36.Google Scholar
Johansson, S. & K. Hofland
1994Towards an English-Norwegian parallel corpus. In U. Fries, G. Tottie & P. Schneider (eds.) Creating and using English language corpora: 25–37. Rodopi.Google Scholar
Johansson, S. & A-B. Stenström
(eds.) 1991English computer corpora. Mouton de Gruyter. DOI logoGoogle Scholar
Karlsson, F
1994Robust parsing of unconstrained text. In N. Oostdijk & P. De Haan (eds.): 121–142.Google Scholar
Keulen, F
1986The Dutch computer corpus pilot project. In J. Aarts & W. Meijs (eds.) Corpus linguistics II: 127–155. Rodopi.Google Scholar
Knowles, G
1993The Machine-Readable Spoken English Corpus. In J. Aarts, P. De Haan & N. Oostdijk (eds.): 107–119.Google Scholar
Kučera, H. & W.N. Francis
1967Computational analysis of present-day American English. Brown University Press.Google Scholar
Kytö, M
1991Manual to the diachronic part of the Helsinki corpus of English texts. Helsinki University Dept. of English.Google Scholar
., M. Rissanen & S. Wright (eds.) 1994Corpora across the centuries. Rodopi.Google Scholar
Leech, G
1991The state of the art in corpus linguistics. In K. Aijmer & B. Altenberg (eds.) English corpus linguistics: 8–29. Longman.Google Scholar
Leech, G. & R. Garside
1991Running a grammar factory. In S. Johansson & A-B. Stenström (eds.): 15–32. DOI logoGoogle Scholar
Leech, G., R. Garside & M. Bryant
1994The large-scale grammatical tagging of text: experience with the British National Corpus. In N. Oostdijk & P. De Haan (eds.): 47–63.Google Scholar
Marcus, M., B. Santorini & M. Marcinkiewicz
1993Building a large annotated corpus of English. Computational Linguistics 19: 313–330.Google Scholar
Oostdijk, N. & P. De Haan
(eds.) 1994Corpus-based research into language. Rodopi.Google Scholar
Quirk, R
1960Towards a description of English usage. Transactions of the Philological Society: 40–61. DOI logoGoogle Scholar
1992On corpus principles and design. In J. Svartvik (ed.): 457–469. DOI logoGoogle Scholar
Renouf, A
1993A word in time: first findings from the investigation of dynamic text. In J. Aarts, P. De Haan & N. Oostdijk (eds.): 279–288.Google Scholar
Sampson, G
1994SUSANNE: a Domesday Book of English grammar. In N. Oostdijk & P. De Haan (eds.): 169–187.Google Scholar
Souter, C
1989A short handbook to the Polytechnic of Wales corpus. Norwegian Computing Centre for the Humanities.Google Scholar
Svartvik, J
(ed.) 1990The London-Lund corpus of spoken English. Lund University Press.Google Scholar
(ed.) 1992Directions in corpus linguistics. Mouton de Gruyter. DOI logo  BoPGoogle Scholar
Taylor, L., G. Leech & S. Fligelstone
1991A survey of English machine-readable corpora. In S. Johansson & A-B. Stenström (eds.): 319–354.[See also: Statistics] DOI logoGoogle Scholar