Article published in:Compilation, transcription, markup and annotation of spoken corpora
Edited by John M. Kirk and Gisle Andersen
[International Journal of Corpus Linguistics 21:3] 2016
► pp. 323–347
Semi-lexical features in corpus transcription
Consistency, comparability, standardisation
An aspect of corpus compilation that poses a particular challenge is the question of how to transcribe orthographically units that are not part of any standardised vocabulary. Among the problematic categories we find voiced pauses, minimal response signals, interjections, certain discourse markers, phonologically reduced forms, colloquialisms and dialect forms. Such semi-lexical features are usually represented by regular phonemic-graphemic correspondences but are nevertheless often inconsistently handled. This paper reviews a number of existing transcription guidelines and assesses whether the recommendations they provide are sufficient and detailed enough to secure a consistent transcription of the categories mentioned. Further, the paper assesses to what extent transcription of semi-lexical features is consistent within and across two spoken corpora. On the basis of a cross-corpus comparison of the Bergen Corpus of London Teenage Language (COLT) and the London English Corpus (LEC), the paper provides specific recommendations for corpus transcription.
Keywords: interjections, minimal response signals, spoken corpus, colloquialisms, discourse markers
Published online: 29 September 2016
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E.
Cheshire, J., Fox, S., Kerswill, P., & Torgersen, E.
Cheshire, J., Kerswill, P., Fox, S., & Torgersen, E.
Du Bois, J.W., Schuetze-Coburn, S., Cumming, S., & Danae, P.
Gibbon, D., Moore, R., & Winsky, R.
(2002) International Corpus of English: Markup Manual for: Spoken Texts. Retrieved from http://ice-corpora.net/ICE/spoken.doc (last accessed November 2015).
Poplack, S. & Tagliamonte, S.
Sachs, H., Schegloff, E.A., & Jefferson, G.
Stenström, A.-B., Andersen, G., & Hasund, K.
I. TEI P5: Guidelines for Electronic Text Encoding and Interchange.
Torgersen, E., Gabrielatos, C., Hoffman, S., & Fox, S.
van den Heuvel, H., & Boves, L.
Cited by 3 other publications
KIRK, JOHN M.
Pizarro Pedraza, Andrea
Põldvere, Nele, Johan Frid, Victoria Johansson & Carita Paradis
This list is based on CrossRef data as of 28 august 2021. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.