Computer corpora in English language research
A critical survey
This paper provides an overview of various English language corpora. It examines the relationships between the various extrant corpora and also indicates some of the features of a corpus of written English being developed in Australia. The article considers some of the linguistic and theoretical constraints on corpus-based research.
References (53)
References
Aarts, J. and W. Meijs (1984) Corpus linguistics: recent developments in the use of computer corpora in English language research. Amsterdam, Rodopi.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Aarts, J. and W. Meijs (eds.) (1986) Corpus linguistics II: new studies in the analysis and exploitation of computer corpora. Amsterdam, Rodopi.Aijmer, K. (1987) Oh and ah in English conversation. In Meijs (ed.) (1987): 61–86.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Altenberg, B. (1987) Prosodic patterns in spoken English: studies in the correlation between prosody and grammar for text-to-speech conversation. Lund Studies in English 76. Lund, Lund University Press.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Atwell, E. (1983) Constituent likelihood grammar. ICAME News 71:34–67. Norwegian Computing Centre for Humanities.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Atwell, E., G. Leech and R. Garside (1984) Analysis of the LOB Corpus: progress and prospects. In Aarts and Meijs (1984): 41–52.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Biber, D. (1985) Investigating macroscopic textual variation through multi-feature/multi-dimensional analyses. Linguistics 32,2:337–60.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Biber, D. (forthcoming) Spoken and written textual dimensions in English: Resolving the contradictory findings. Language 621:384–414. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
Briscoe, T., I. Craig and C. Clover (1987) The use of the LOB Corpus in the development of a phrase structure grammar of English. In Meijs (1987): 207–218.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Coates, J. (1983) The semantics of modal auxiliaries. London and Canberra, Croom Helm.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Collins, P.C. (1985) Th-clefts and all-clefts. Beiträge zur Phonetik und Linguistik 41:45–53.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Collins, P.C. (1987) Cleft and pseudo-cleft constructions in English spoken and written discourse. ICAME Journal 111:5–17.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Collins, P.C. and P. Peters (forthcoming) The Australian Corpus Project. In Ihalainen, O., M. Kytö and M. Rissanen (eds.) Proceedings of the Eighth International Conference on English Language Research on Computerized Corpora. Amsterdam, Rodopi (to appear).![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Eeg-Olofsson, M. and J. Svartvik (1984) Four-level tagging of spoken English. In Aarts and Meijs (1984): 53–64.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Ellegärd, A. (1978) The syntactic structure of English texts: a computer based study of four kinds of text in the Brown University Corpus. (Gothenburg Studies in English, 43), Gothenburg University.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Fjelkestan-Nilsson, B. (1983) ALSO and TOO: a corpus-based study of their frequency and use in Modern English. Stockholm, Stockholm Studies in English, LVIII.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Francis, W.N. (1980) A tagged corpus – problems and prospects. In S. Greenbaum, G. Leech and J. Svartvik (eds.) Studies in English linguistics: for Randolph Quirk. London, Longman: 192–209.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Francis, W.N. (1982) Problems of assembling and computerizing large corpora. In Johansson (1982): 7–24.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Francis, W.N. and H. Kučera (1964) Manual of information to accompany a standard corpus of present-day edited American English, for use with digital computers. Providence, R.I., Department of Linguistics, Brown University.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Francis, W.N. and H. Kučera (1982) Frequency analysis of English usage: lexicon and grammar. Boston, Houghton Mifflin.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Garside, R. and G.N. Leech (1982) Grammatical tagging of the LOB Corpus: general survey. In Johansson (1982): 110–117.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Geens, D. (1975/6) Analysis of present-day English theatrical language 1966-72. Leuven, K.U.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Greenbaum, S. and R. Quirk (1970) Elicitatlon experiments in English: linguistic studies in use and attitude. London, Longman.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Greene, B.B. and G.M. Rubin (1971) Automatic grammatical tagging of English. Providence, R.I., Department of Linguistics, Brown University.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hofland, K. and S. Johansson (1982) Word frequencies in British and American English. Bergen, Norwegian Computing Centre for the Humanities.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Ihalainen, O., M. Kytö and M. Rissanen (1987) The Helsinki Corpus of English Texts: diachronic and dialectal report on work in progress. In Meijs (1987): 21–32.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Johansson, S. (ed.) (1982) Computer corpora in English language research. Bergen, Norwegian Computing Centre for the Humanities.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Johansson, S., G. Leech and H. Goodluck (1978) Manual of information to accompany the Lancaster-Oslo/Bergen Corpus of British English, for use with digital computers. Oslo, Department of English, University of Oslo.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Johansson, S. and M.C. Jahr (1982) Grammatical tagging of the LOB: predicting word class from word endings. In Johansson (1982): 118–146.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Johansson, S. and E.H. Norheim (1988) The subjunctive in British and American English. ICAME Journal 121:56–57.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Johansson, S. and K. Hofland (forthcoming) Frequency analysis of English vocabulary and grammar.
Kaye, G. (1988) The design of the database for the Survey of English Usage. ICAME Journal 121:56–57.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kjellmer, G. (1986) ‘The lesser man’: Observations on the role of women in modern English writings. In Aarts and Meijs (1986): 163–176.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Leech, G., R. Garslde and E. Atwell (1983a) The automatic grammatical tagging of the LOB Corpus. ICAME News 71:13–33.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Leech, G. R. Garside and E. Atwell (1983b) Recent developments in the use of computer corpora in English Language research. Transactions of the Philological Society: 23–40. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Leech, G. and A. Beale (1985) Computers in English language research. Language Teaching 17,3:216–29. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Marshall, I. (1938) Choice of grammatical word-class without global syntactic analysis: tagging words in the LOB Corpus. Computers and the Humanities 17,3:139–50. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Martin, J.R. (1984) Language, register and genre. In F. Christie (ed.) Language studies: children writing. Geelong, Victoria, Deakin University Press: 21–30.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Meijs, W. (ed.) (1987) Corpus linguistics and beyond. Amsterdam, Rodopi.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Oddy, R.N., S.E. Robertson, C.J. van Rigsbergen and P.W. Williams (eds.) (1981) Information retrieval research. London, Butterworths.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Oostdijk, N. (1988) A corpus for studying linguistic variation. ICAME Journal 121:3–14.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Peters, P. (1987) Towards a corpus of Australian English. ICAME Journal 111:27–38.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Quirk, R. and J. Svarvik (1966) Investigating linguistic acceptability. The Hague, Mouton. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Sampson, G. (1987) Evidence against the ‘grammatical/ungrammatical’ distinction. In Meijs (1987): 219–226.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Shastri, S.V. (1980) A computer corpus of present-day Indian English. ICAME News 41:9–12.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Shastri, S.V. (1985) Word frequencies in Indian English: a preliminary report. ICAME News 91:38–44.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Shastri, S.V. (1988) The Kolhapur Corpus of Indian English and work done on its basis so far. ICAME Journal 121:15–26.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Sinclair, J.McH. (1982) Reflections on computer corpora in English language research. InJohansson (1982): 1–6.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Svartvik, J. (1984) Text Segmentation for Speech (TESS): presentation of a project. Survey of Spoken English, Lund University.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Svartvik, J., M. Eeg-Olofsson, O. Forsheden, B. Orestrom and C. Thavenius (eds.) (1982) A Survey of Spoken English: report on research 1975-81. Lund, Gleerup.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Svartvik, J. and M. Eeg-Olofsson (1982) Tagging the London-Lund Corpus of Spoken English. In Johansson (1982): 85–109.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Svartvik J. and R. Quirk (eds.) (1980) A corpus of English conversation. Lund, Gleerup/Liber.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Thavenius, C. (1982) Exophora in English conversation. In N.E. Enkvist (ed.) (1982) Impromptu speech: a symposium. Åbo, Åbo Akademi: 291–305.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Tottie, G., B. Altenberg and L. Hermeràn (1983) English in speech and writing. ETOS Report 1. Lund and Uppsala: the Departments of English and the Universities of Lund and Uppsala.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cited by (1)
Cited by one other publication
Altenberg, Bengt
1991.
A bibliography of publications relating to English computer corpora. In
English Computer Corpora,
► pp. 355 ff.
![DOI logo](//benjamins.com/logos/doi-logo.svg)
This list is based on CrossRef data as of 29 june 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.