Chapter published in:
Parallel Corpora for Contrastive and Translation Studies: New resources and applicationsEdited by Irene Doval and M. Teresa Sánchez Nieto
[Studies in Corpus Linguistics 90] 2019
► pp. 159–182
Discourse annotation in the MULTINOT corpus
Issues and challenges
Julia Lavid López | Complutense University of Madrid
This chapter summarises and discusses recent work on the development of a bilingual (English-Spanish) corpus consisting of original comparable and parallel texts from a variety of genres and annotated with complex linguistic features such as modality and evidentiality, metadiscourse markers, and thematization, as carried out within the framework of the MULTINOT project. The annotation of these complex features in bilingual parallel texts poses important challenges for the researcher at the different stages of the corpus development, from the preprocessing phases to the manual annotation phase, but, at the same time, it allows the investigation of complex linguistic research questions which could not be addressed on the basis of raw corpora or even with the help of an automatic part-of-speech tagging system.
Keywords: discourse, corpus, annotation, English, Spanish
Article outline
- 1.Introduction
- 2.The MULTINOT corpus
- 3.Annotation procedure
- 3.1Selecting the “training” corpus
- 3.2Instantiating the theory
- 3.3Designing annotation schemes and guidelines
- 3.4Performing annotation experiments
- 3.5Evaluating the annotations
- 3.6Large-scale annotation of the whole corpus
- 4.Annotating thematization in English and Spanish
- 5.Annotating modality in English and Spanish
- 6.Annotating metadiscourse markers in English and Spanish
- 7.Summary and concluding remarks
-
Acknowledgement -
Notes -
References
Published online: 20 March 2019
https://doi.org/10.1075/scl.90.10lop
https://doi.org/10.1075/scl.90.10lop
References
Arús, Jorge, Lavid, Julia & Moratón, Lara
Baker, Kathryn, Bloodgood, Michael, Dorr, Bonnie J., Callison-Burch, Chris, Filardo, Nathaniel W., Piatko, Christine, Lori & Miller, Scott
Boye, Kasper
Correia, Rui, Mamede, Nuno, Baptista, Jorge & Eskenazi, Maxine
2016 MetaTED: A corpus of metadiscourse for spoken language. In Proceedings of the Tenth International Conference on Language Resources and Evaluation, Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis (eds), 3907–3913. http://www.inesc-id.pt/publications/13153/pdf (20 July 2017).
Cohen, Jacob
Hendrickx, Iris, Mendes, Amália & Mencarelli, Silvia
2012 Modality in text: A proposal for corpus annotation. In Proceedings of the Eighth International Conference on Language Resources and Evaluation – LREC 2012, Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk & Stelios Piperidis (eds), 1805–1812. Istanbul: European Language Resources Association.
Hovy, Eduard & Lavid, Julia
Hyland, Ken & Tse, Polly
Krippendorff, Klaus
2004 Reliability in Content Analysis: Some common Misconceptions and Recommendations. Human Communication Research 30(3): 411–433. < http://repository.upenn.edu/asc_papers/242 (12 Nomvember 2018).
Lavid, Julia, Arús, Jorge & Zamorano, Juan R.
Lavid, Julia & Moratón, Lara
2016 Generic structures, rhetorical relations and thematic patterns in English and Spanish journalistic texts: A comparative study. (Paper presented at the 26th ESFLW).
Lavid, Julia, Carretero, Marta, Arús Hita, Jorge, Moratón, Lara & Zamorano-Mansilla, Juan Rafael
2014 Contrastive corpus annotation in the CONTRANOT Project: issues and problems. In The Functional Perspective on Language and Discourse. Applications and Implications [Pragmatics & Beyond New Series 296], Maria Ángeles Gómez González, Francisco José Ruiz de Mendoza Ibáñez, Francisco Gonzálvez-García & Angela Downing (eds), 57–86. Amsterdam: John Benjamins. 

Lavid, Julia & Moratón, Lara
2016 Annotating metadiscourse markers in the English-Spanish MULTINOT corpus: Preliminary Steps. In Conference Handbook of TextLink – Structuring Discourse in Multilingual Europe Second Action Conference, Liesbeth Degand, Csilla Dér, Péter Furkó, Bonnie Webber (eds), 79–81. Debrecen: Debrecen University Press.
Lavid, Julia, Arús, Jorge & Moratón, Lara
Lavid, Julia, Carretero, Marta & Zamorano, Juan R.
2016a Contrastive annotation of epistemicity in the multinot project: preliminary steps. In Proceedings of the ISA-12, Twelfth Joint ACL – ISO Workshop on Interoperable Semantic Annotation, held in conjunction with Language Resources and Evaluation Conference 2016, Harry Bunt (ed.), 81–88. https://sigsem.uvt.nl/isa13/ISA-13_proceedings.pdf (20 July 2017).
Lavid, Julia, Carretero, Marta & Zamorano Juan R.
2016b A linguistically-motivated annotation model of modality in English and Spanish: Insights from MULTINOT. Linguistic Issues in Language Technology 14(4): 1–35. Standford CA: CSLI. http://csli-lilt.stanford.edu/ojs/index.php/LiLT/article/view/67 (20 July 2017).
McShane, Marjorie, Nirenburg, Sergei & Zacharski, Ron
Mora, Natalia
Nissim, Malvina, Pietrandrea, Paola, Sansò, Andrea & Mauri, Caterina
2013 Cross-linguistic annotation of modality: A data-driven hierarchical model. In Proceedings of the 9th Joint ISO – ACL SIGSEM Workshop on Interoperable Semantic Annotation, (isa-9) Harry Bunt (ed.), 7–14. Potsdam. http://aclweb.org/anthology/W13-05 (20 July 2017).
Saurí, Roser & Pustejovsky, James
Szarvas, György, Vincze, Veronika, Farkas, Richárd & Csirik, János
2008 The BioScope corpus: Annotation for negation, uncertainty and their scope in biomedical texts. BioNLP 2008: Current Trends in Biomedical Natural Language Processing, 38–45, Columbus OH: Association for Computational Linguistics. http://www.aclweb.org/anthology/W08-0606 (20 July 2017).
Trnavac, Radoslava, Das, Debopam & Taboada, Maite
Taboada, Maite
Van de Kauter, Marjan, Coorman, Geert, Lefever, Els, Desmet, Bart, Macken, Lieve & Hoste, Veronique
Wiebe, Janyce, Wilson, Theresa & Cardie, Claire