Article published in:Compilation, transcription, markup and annotation of spoken corpora
Edited by John M. Kirk and Gisle Andersen
[International Journal of Corpus Linguistics 21:3] 2016
► pp. 348–371
Compiling computer-mediated spoken language corpora
Key issues and recommendations
This paper discusses key issues in the compilation of spoken language corpora in a computer-mediated communication (CMC) environment, using data from the Corpus of Academic Spoken English (CASE), a corpus of Skype conversations currently being compiled at Saarland University, Germany, in cooperation with European and US partners. Based on first findings, Skype is presented as a suitable tool for collecting informal spoken data. In addition, new recommendations concerning data compilation and transcription are put forward to supplement existing best practice as presented in Wynne (2005). We recommend the preservation of multimodal features during anonymisation, and the addition of annotation elements already at the transcription stage, particularly CMC-related discourse features, English as a Lingua Franca (ELF) features (e.g. non-standard language and code-switching), as well as the inclusion of prosodic, paralinguistic, and non-verbal annotation. Additionally, we propose a layered corpus design in order to allow researchers to focus on specific annotation features.
Keywords: Computer-mediated communication (CMC), data compilation and transcription, spoken language corpora, , best practice
Published online: 29 September 2016
Adolphs, S., & Carter, R.
ECAMM – Call Recorder for Mac
(2013) [Computer software]. Retrieved from http://www.ecamm.com/mac/callrecorder/ (last accessed March 2016).
CASE – Corpus of Academic Spoken English
Forthcoming S. Diemer, M.-L. Brunner, C. Collet & S. Schmidt). . Saarbrücken: Saarland University (Coordination) / Sofia: St Kliment Ohridski University / Forlì: University of Bologna-Forlì / Santiago: University of Santiago de Compostela / Helsinki: Helsinki University & Hanken School of Economics / Birmingham: Birmingham City University / Växjö: Linnaeus University / Louvain-la-Neuve: Université catholique de Louvain / Lyon: Université Lumière Lyon 2 / Boise: Boise State University. Retrieved from http://www.uni-saarland.de/campus/fakultaeten/fachrichtungen/philosophische-fakultaet-ii/fachrichtungen/fr43/staff/adjunct-faculty/engling2/case.html (last accessed March 2016).
CLAWS Part-of-Speech Tagger for English
(1994-2016) [Computer software]. Retrieved from http://www.comp.lancs.ac.uk/computing/research/ucrel/claws/ (last accessed March 2016).
Conrad, S., & Mauranen, A.
Dressler, R.A., & Kreuz, R.J.
ELFA – The Corpus of English as a Lingua Franca in Academic Settings
(2008) A. Mauranen (Director). Retrieved from http://www.helsinki.fi/elfa/elfacorpus (last accessed February 2015).
(2014) CASE XML Conversion Tool [Computer software]. Retrieved from http://rdues.bcu.ac.uk/case (last accessed November 2015).
Gibbon, D., Moore R., & Winski, R.
ICE Corpus annotation guidelines
(2009) Retrieved from http://ice-corpora.net/ice/annotate.htm (last accessed March 2016).
IFA Dialog Video Corpus
(2008) Retrieved from http://www.fon.hum.uva.nl/IFA-SpokenLanguageCorpora/IFADVcorpus/ (last accessed March 2016).
Jefferson, G., Sacks, H., & Schegloff, E.A.
(2002) ICE mark-up manual for spoken texts. Retrieved from http://ice-corpora.net/ice/spoken.doc (last accessed 31 March 2016)
Sauer, S., & Lüdeling, A.
Schmidt, S., Brunner, M.-L., & Diemer, S.
(2014) CASE: Corpus of Academic Spoken English: Transcription Conventions. Retrieved from http://www.uni-saarland.de/index.php?id=48506 (last accessed March 2016).
Supertintin – Skype Video Call Recorder (2013) [Computer software]. Retrieved from http://www.supertintin.com/index.html (last accessed March 2016).
VOICE – The Vienna-Oxford International Corpus of English
(Version 2.0 XML) (2013) B. Seidlhofer (Director). Vienna: University of Vienna. Retrieved from https://www.univie.ac.at/voice/ (last accessed March 2016).
(Ed.) (2005) Developing Linguistic Corpora: A Guide to Good Practice. Oxford: Oxbow Books. Retrieved from http://users.ox.ac.uk/~martinw/dlc/index.htm (last accessed March 2016).
Cited by 3 other publications
Gonzales, Wilkinson Daniel Wong, Mie Hiramoto, Jakob R. E. Leimgruber & Jun Jie Lim
Kok, Kasper I.
Steen, Francis F., Anders Hougaard, Jungseock Joo, Inés Olza, Cristóbal Pagán Cánovas, Anna Pleshakova, Soumya Ray, Peter Uhrig, Javier Valenzuela, Jacek Woźny & Mark Turner
This list is based on CrossRef data as of 09 april 2021. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.