Exploiting a Large Spoken Corpus
An End-user's Way to the BNC
The British National Corpus (BNC) contains a spoken component of about 10 million words, consisting of spoken language of various kinds produced by different speakers in a variety of situations. Starting from an end-user s perspective, this paper surveys the potential of this resource and some possible problems one might encounter if not fully versed in the details of the compilation and coding plans. Among the issues touched upon are questions relating to the composition of the component, the transcription principles employed, and points relating to the nature and coverage of the mark-up. By way of illustration, examples are drawn from a case study of the variant forms gonna and going to.
Keywords: British National Corpus, Spoken English, Mark-up, Gonna vs. Going to, Transcription
Published online: 13 August 1999
Cited by 3 other publications
This list is based on CrossRef data as of 29 august 2021. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.