Publications

Publication details [#60944]

Andersen, Gisle. 2016. Semi-lexical features in corpus transcription. Consistency, comparability, standardisation. International Journal of Corpus Linguistics 21 (3) : 323–347.
Publication type
Article in journal
Publication language
English
Place, Publisher
John Benjamins
Journal DOI
10.1075/ijcl

Annotation

An aspect of corpus compilation that poses a particular challenge is the question of how to transcribe orthographically units that are not part of any standardised vocabulary. Among the problematic categories one finds voiced pauses, minimal response signals, interjections, certain discourse markers, phonologically reduced forms, colloquialisms and dialect forms. Such semi-lexical features are usually represented by regular phonemic-graphemic correspondences but are nevertheless often inconsistently handled. This paper reviews a number of existing transcription guidelines and assesses whether the recommendations they provide are sufficient and detailed enough to secure a consistent transcription of the categories mentioned. Further, the paper assesses to what extent transcription of semi-lexical features is consistent within and across two spoken corpora. On the basis of a cross-corpus comparison of the Bergen Corpus of London Teenage Language (COLT) and the London English Corpus (LEC), the paper provides specific recommendations for corpus transcription.