Article published in:
Compilation, transcription, markup and annotation of spoken corpora
Edited by John M. Kirk and Gisle Andersen
[International Journal of Corpus Linguistics 21:3] 2016
► pp. 419438


Anderson, A.H., Bader, M., Gurman Bard, E., Boyle, E., Doherty, G., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thompson, H.S., & Weinert, R.
(1991) The HCRC Map Task Corpus. Language and Speech, 34(4), 351–366.Google Scholar
Belz, M.
(2013) Disfluencies und Reparaturen bei Muttersprachlern und Lernern: Eine kontrastive Analyse. Humboldt-Universität zu Berlin. Retrieved from http://​edoc​.hu​-berlin​.de​/docviews​/abstract​.php​?id​=40482 (last accessed March 2014).Google Scholar
(2014) BeMaTaC: A Deeply Annotated Multimodal Map-task Corpus of Spoken Learner and Native German. Retrieved from http://​u​.hu​-berlin​.de​/bematac (last accessed March 2014).Google Scholar
Boersma, P.
(2010) Praat: A system for doing phonetics by computer. Glot International, 5(9/10), 341–345.Google Scholar
Brinckmann, C., Kleiner, S., Knöbl, R., & Berend, N.
(2008) German today: An areally extensive corpus of spoken Standard German. In N. Calzolari, Kh. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis & D. Tapias (Eds.), Proceedings of the Sixth International Conference on Language Resources and Evaluation (pp. 3185–3191). Paris: ELRA.Google Scholar
Buchholz, S., & Marsi, E.
(2006) CoNLL-X shared task on multilingual dependency parsing. In L. Màrquez & D. Klein (Eds.), Proceedings of the 10th Conference on Computational Natural Language Learning (pp. 149–164). Stroudsburg, PA: Association for Computational Linguistics.Google Scholar
Burnard, L.
(Ed.) (2007) Reference Guide for the British National Corpus (XML Edition). Oxford: Research Technologies Service. Retrieved from http://​www​.natcorp​.ox​.ac​.uk​/XMLedition​/URG (last accessed March 2014).Google Scholar
Carletta J., Evert, S., Heid, U., Kilgour, J., Robertson, J., & Voormann, H.
(2003) The NITE XML Toolkit: Flexible annotation for multi-modal language data. Behavior Research Methods, Instruments, & Computers, 35(3), 353–363. CrossrefGoogle Scholar
Carletta J., Evert, S., Heid, U., & Kilgour, J.
(2005) The NITE XML Toolkit: Data model and query. Language Resources and Evaluation, 39(4), 313–334. CrossrefGoogle Scholar
Chiarcos, C., Dipper, S., Götze, M., Leser, U., Lüdeling, A., Ritz, J., & Stede, M.
(2009) A flexible framework for integrating annotations from different tools and tagsets. Traitement Automatique des Langues, 49(2), 271–291.Google Scholar
Creative Commons
(2014) About the Licenses - Creative Commons. Retrieved from http://​creativecommons​.org​/licenses (last accessed March 2014).Google Scholar
Dipper, S.
(2005) XML-based stand-off representation and exploitation of multi-level linguistic annotation. In R. Eckstein & R. Tolksdorf (Eds.), Proceedings of Berliner XML Tage 2005 (pp. 39–50). Berlin: Humboldt-Universität zu Berlin.Google Scholar
Dipper, S., Lüdeling, A., & Reznicek, M.
(2013) NoSta-D: A corpus of German non-standard varieties. In M. Zampieri & S. Diwersy (Eds.), Non-Standard Data Sources in Corpus-Based Research (pp. 69–76). Aachen: Shaker.Google Scholar
Druskat, S., Bierkandt, L., Gast, V., Rzymski, C., & Zipser, F.
(2014) Atomic: An open-source software platform for multi-level corpus annotation. In J. Ruppenhofer & G. Faaß (Eds.), Proceedings of the 12th Konferenz zur Verarbeitung natürlicher Sprache (KONVENS 2014) (pp. 228–234). Retrieved from http://​nbn​-resolving​.de​/urn:nbn:de:gbv:hil2​-opus​-2866 (last accessed May 2015).Google Scholar
Gerdes, K.
(2014) Arborator [Computer software]. Retrieved from http://​arborator​.ilpga​.fr (last accessed March 2014).Google Scholar
Giesel, L., Klapi, M., Krüger, D., Nunberger, I., Rasskazova, O., & Sauer, S.
(2013) Berlin Map Task Corpus: A deeply annotated multimodal map-task corpus of spoken learner and native German. Poster presented at the 35. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft , Potsdam, Germany. Retrieved from http://​korpling​.german​.hu​-berlin​.de​/bematac​/publications​/Giesel​-et​-al​_2013​_DGfS​-CL​-2013​.pdf (last accessed March 2014).
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I.H.
(2009) The WEKA data mining software: An update. In O.R. Zaiane (Ed.), SIGKDD Explorations, 11(1), 10–18.Google Scholar
Hanke, T., & Storz, J.
(2008) iLex: A database tool for integrating sign language corpus linguistics and sign language lexicography. In O. Crasborn, E. Efthimiou, T. Hanke, E. Thoutenhoofd & I. Zwitserlood (Eds.), LREC 2008 Workshop, Proceedings, W 25: 3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora (pp. 64–67). Paris: ELRA.Google Scholar
Himmelmann, N.P.
(2012) Linguistic data types and the interface between language documentation and description. Language Documentation & Conservation, 6, 187–207.Google Scholar
Hinrichs, E.W., Hinrichs, M., & Zastrow, T.
(2010) WebLicht: Web-Based LRT services for German. In ACL 2010 System Demonstrations, Proceeding (pp. 25–29). Stroudsburg, PA: Association for Computational Linguistics.Google Scholar
Ide, N., & Suderman, K.
(2007) GrAF: A graph-based format for linguistic annotations. In B. Boguraev, N. Ide, A. Meyers, Sh. Nariyama, M. Stede, J. Wiebe & G. Wilcock (Eds.), ACL 2007 Workshop, Proceedings, Linguistic Annotation Workshop (pp. 25–29). Stroudsburg, PA: Association for Computational Linguistics.Google Scholar
Kirk, J.M.
this volume). The pragmatic annotation scheme of the SPICE-Ireland corpus.
Krause, T., Lüdeling, A., Odebrecht, C., & Zeldes, A.
(2012) Multiple tokenization in a diachronic corpus. Paper presented at Exploring Ancient Languages through Corpora Conference 2012 , Oslo. Retrieved from http://​www​.hf​.uio​.no​/ifikk​/english​/research​/projects​/proiel​/ealc​/abstracts​/Krause​_et​_al​.pdf (last accessed March 2014).
Krause, T., & Zeldes, A.
(2014) ANNIS3: A new architecture for generic corpus query and visualization. Digital Scholarship in the Humanities. Retrieved from http://​dsh​.oxfordjournals​.org​/content​/early​/2014​/12​/02​/llc​.fqu057​.full (last accessed May 2015).Google Scholar
Lüdeling, A.
(2011) Corpora in linguistics: Sampling and annotation. In K. Grandin (Ed.), Going Digital. Evolutionary and Revolutionary Aspects of Digitization (pp. 220–243). New York, NY: Science History Publications.Google Scholar
Max Planck Society
(2014) Max Planck Open Access: Berlin Declaration. Retrieved from http://​openaccess​.mpg​.de​/Berlin​-Declaration (last accessed March 2014).Google Scholar
Müller, C., & Strube, M.
(2006) Multi-level annotation of linguistic data with MMAX2. In S. Braun, K. Kohn & J. Mukherjee (Eds.), Corpus Technology and Language Pedagogy (pp. 197–214). Frankfurt am Main: Peter Lang,Google Scholar
Nivre, J.
(2008) Treebanks. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics. An International Handbook (pp. 225–241). Berlin: Mouton de Gruyter.Google Scholar
Pajas P., & Stepanek, J.
(2008) Recent advances in a feature-rich framework for treebank annotation. In Proceedings of the 22nd International Conference on Computational Linguistics (pp. 673–680). Stroudsburg, PA: Association for Computational Linguistics.
R Core Team
(2013) R: A Language and Environment for Statistical Computing [Computer software]. Retrieved from http://​www​.R​-project​.org (last accessed March 2014).Google Scholar
Sauer, S., & Rasskazova, O.
(2014) BeMaTaC: Eine digitale multimodale Ressource für Sprach- und Dialogforschung. Poster presented at the workshop Grenzen überschreiten – Digitale Geisteswissenschaft heute und morgen , Berlin, Germany. Retrieved from http://​korpling​.german​.hu​-berlin​.de​/bematac​/publications​/Sauer​-Rasskazova​_2014​_3WS​-DHB​.pdf (last accessed March 2014).
Schiel, F., Draxler, C., & Harrington, J.
(2011) Phonemic segmentation and labelling using the MAUS technique. Workshop New Tools and Methods for Very-Large-Scale Phonetics Research . Retrieved from http://​www​.phonetik​.uni​-muenchen​.de​/forschung​/publikationen​/Schiel​-VLSP2011​.pdf (last accessed April 2016).
Schiller, A., Teufel, S., Stöckert, C., & Thielen, C.
(1999) Guidelines für das Tagging deutscher Textcorpora mit STTS (Kleines und großes Tagset). Retrieved from http://​www​.sfs​.uni​-tuebingen​.de​/resources​/stts​-1999​.pdf (last accessed March 2014).Google Scholar
Schmid, H.
(1994) Probabilistic part-of-speech tagging using decision trees. In Proceedings of International Conference on New Methods in Language Processing . Retrieved from ftp://​ftp​.ims​.uni​-stuttgart​.de​/pub​/corpora​/tree​-tagger1​.pdf (last accessed November 2014).
2008Tokenizing and part-of-speech tagging. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics. An International Handbook (pp. 527–551). Berlin: Mouton de Gruyter.Google Scholar
Schmidt, T.
(2004) Transcribing and annotating spoken language with EXMARaLDA. In A. Witt, U. Heid, H.S. Thompson, J. Carletta & P. Wittenburg (Eds.), LREC 2004 Workshop, Proceedings, XML-based Richly Annotated Corpora (pp. 69–74). Paris: ELRA.Google Scholar
Schmidt, T., & Wörner, K.
(2009.) EXMARaLDA: Creating, analysing and sharing spoken language corpora for pragmatic research. Pragmatics, 19(4), 565–582. CrossrefGoogle Scholar
Schmidt, T., Hedeland, H., Lehmberg, T., & Wörner, K.
(2010) HAMATAC: The Hamburg MapTask Corpus. Retrieved from http://​www​.exmaralda​.org​/files​/HAMATAC​.pdf (last accessed March 2014).
Sloetjes, H., & Wittenburg, P.
(2008) Annotation by category: ELAN and ISO DCR. In N. Calzolari, Kh. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis & D. Tapias (Eds.), Proceedings of the Sixth International Conference on Language Resources and Evaluation (pp. 816–820). Paris: ELRA.Google Scholar
Stede, M.
(2011) Discourse Processing. San Rafael, CA: Morgan & Claypool.Google Scholar
Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., & Tsujii, J.
2012Brat: A web-based tool for NLP-assisted text annotation. In F. Segond (Ed.), Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics (pp. 102–107). Stroudsburg, PA: Association for Computational Linguistics.Google Scholar
Stührenberg, M.
(2012) The TEI and current standards for structuring linguistic data. In P. Bański, E. Litta Modignani Picozzi & A. Witt (Eds.), Journal of the Text Encoding Initiative, 3. Retrieved from http://​jtei​.revues​.org​/523 (last accessed March 2014).Google Scholar
TEI Consortium
(2014) TEI: Text Encoding Initiative. Retrieved from http://​www​.tei​-c​.org (last accessed March 2014).Google Scholar
Thompson, P.
(2005) Spoken language corpora. In M. Wynne (Ed.), Developing Linguistic Corpora: A Guide to Good Practice (pp. 59–70). Oxford: Oxbow Books. Retrieved from http://​ahds​.ac​.uk​/linguistic​-corpora (last accessed March 2014).Google Scholar
Wichmann, A.
(2008) Speech corpora and spoken corpora. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics. An International Handbook (pp. 187–207). Berlin: Mouton de Gruyter.Google Scholar
Wörner, K.
(2009) Werkzeuge zur flachen Annotation von Transkriptionen gesprochener Sprache. Bielefeld: Bielefeld University. Retrieved from https://​pub​.uni​-bielefeld​.de​/download​/2301935​/2301938 (last accessed April 2016).Google Scholar
Wynne, M.
(2008) Searching and concordancing. In A. Lüdeling, & M. Kytö. (Eds.), Corpus Linguistics. An International Handbook (pp. 706–737). Berlin: Mouton de Gruyter.Google Scholar
Yimam, S.M., Gurevych, I., Eckart de Castilho, R., & Biemann, C.
(2013) WebAnno: A flexible, web-based and visually supported system for distributed annotations. In M. Butt & S. Hussain (Eds.), 51st Annual Meeting of the Association for Computational Linguistics: Proceedings of the Conference System Demonstration (pp. 1–6). Stroudsburg, PA: Association for Computational Linguistics.Google Scholar
Zeldes, A., Ritz, J., Lüdeling, A., & Chiarcos, C.
(2009) ANNIS: A search tool for multi-layer annotated corpora. In M. Mahlberg, V. González-Díaz & C. Smith (Eds.), Proceedings of Corpus Linguistics 2009. Retrieved from http://​edoc​.hu​-berlin​.de​/docviews​/abstract​.php​?id​=36996 (last accessed March 2014).Google Scholar
Zipser, F., & Romary, L.
(2010) A model oriented approach to the mapping of annotation formats using standards. In G. Budin, L. Romary, T. Declerck & P. Wittenburg (Eds.), LREC 2010 Workshop, Proceedings, W4: Language Resource and Language Technology Standards. Paris: ELRA. Retrieved from http://​hal​.inria​.fr​/inria​-00527799 (last accessed November 2014).Google Scholar
Cited by

Cited by 6 other publications

Belz, Malte, Simon Sauer, Anke Lüdeling & Christine Mooshammer
2017. Fluently disfluent?. International Journal of Learner Corpus Research 3:2  pp. 118 ff. Crossref logo
Diemer, Stefan, Marie-Louise Brunner & Selina Schmidt
2016. Compiling computer-mediated spoken language corpora. International Journal of Corpus Linguistics 21:3  pp. 348 ff. Crossref logo
2017. Developments in the spoken component of ICE corpora. World Englishes 36:3  pp. 371 ff. Crossref logo
Põldvere, Nele, Johan Frid, Victoria Johansson & Carita Paradis
2021. Challenges of releasing audio material for spoken data: The case of the London-Lund Corpus 2. Research in Corpus Linguistics 9:1  pp. 35 ff. Crossref logo
Weise, Andreas, Vered Silber-Varod, Anat Lerner, Julia Hirschberg & Rivka Levitan
2020. Entrainment in spoken Hebrew dialogues. Journal of Phonetics 83  pp. 101005 ff. Crossref logo
Zeldes, Amir
2020.  In A Practical Handbook of Corpus Linguistics,  pp. 49 ff. Crossref logo

This list is based on CrossRef data as of 27 october 2021. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.