Contrastive corpus annotation in the CONTRANOT project
Issues and problems
In this paper we outline a number of issues and problems which arise during
the process of contrastive human-coded corpus annotation of certain semantic
and discourse categories within the framework of the CONTRANOT project,
aimed at the creation and validation of contrastive functional descriptions
through corpus analysis and annotation. Human-coded corpus annotation is a
preliminary step for the training of computer algorithms which allow the automation
of the annotation of large corpora, but it can also serve as a mechanism
for testing aspects of linguistic theories empirically, such as theory formation
and theory-redefinition, as well as enriching theories with quantitative information.
The work reported in this paper focuses on the annotation of the category
of Thematisation, on the one hand, and on Modality, on the other, to illustrate
the challenges researchers have to face when confronted with the task of developing
well-designed and reliable annotation procedures for complex linguistic
phenomena in a contrastive manner. We describe the annotation tasks and
procedures developed so far, which include the design of annotation schemas
on the basis of available linguistic theories and the testing of their reliability
through agreement studies. We also evaluate and discuss the results of the annotations
on the basis of their relevance for the theoretical characterisation of the
investigated phenomena. We expect that our work will have an impact in the
area of contrastive textual analysis, and that it will pave the way for the development
of automated annotation systems for computational applications.
References
Arús, Jorge, Julia Lavid, and Lara Moratón
2012 “Annotating Thematic Features in English and Spanish: A Contrastive Corpus-based Study.
” Linguistics and the Human Sciences
6: 173–192.
Carretero, Marta, and Juan Rafael Zamorano-Mansilla
2010 “Annotating English and Spanish corpora for the categories of epistemic and deontic modality.” Paper presented at the 4
th International Conference on Modality in English. Madrid, Universidad Complutense, 9–11 September.
Carretero, Marta, and Maite Taboada
In press. “The Annotation of Appraisal: How Attitude and Epistemic Modality Overlap in English and Spanish Consumer Reviews.” In
Thinking Modally: English and Contrastive Studies on Modality
ed. by
Juan Rafael Zamorano-Mansilla,
E. Domínguez-Romero,
C. Maíz-Arévalo, and
M. V. Martín de la Rosa Bern Peter Lang
Collins, Peter
2009
Modals and Quasi–modals in English
. Amsterdam: Rodopi.
Hermerén, Lars
1978
On Modality in English: the Study of the Semantics of the Modals
. Lund: Gleerup.
Hovy, Eduard, and Julia Lavid
2010 “Towards a Science of Corpus Annotation: A New Methodological Challenges for Corpus Linguistics.
” International Journal of Translation
22 (1): 13–36.
Lavid, Julia
2012 “Corpus Analysis and Annotation in CONTRANOT: Linguistic and Methodological Challenges.” In
Encoding the Past, Decoding the Future: Corpora in the 21st Century
, ed. by
Isabel Moskowich, and
Begoña Crespo, 205–220. Cambridge: Cambridge Scholars.
Lavid, Julia, Jorge Arús, and Juan Rafael Zamorano-Mansilla
2010
Systemic Functional Grammar of Spanish: A Contrastive Study with English
. London: Continuum.
Lavid, Julia, Jorge Arús, and Lara Moratón
2010a “Towards an Annotated English–Spanish Corpus with SFL–based Textual Features.” Paper presented at the 37
th International Systemic–Functional Congress. Vancouver, Canada.
Lavid, Julia, Jorge Arús, and Lara Moratón
2010b “Investigating Thematic Meaning in English and Spanish: A Methodological Proposal.” Paper presented at the 22
nd European Systemic–Functional Linguistics Conference and Workshop. University of Primorska (Koper, Eslovenia). To be published in G. O’Grady,
et al. (eds.).
Choice in Language: Applications in Text Analysis
. London: Equinox.
McEnery, Anthony, R. Xiao, and Y. Tono
2006
Corpus-based Language Studies: An Advanced Resource Book
. New York: Routledge.
Palmer, Frank R
1990
Modality and the English Modals
. London: Longman.
Perkins, Michael R
1983
Modal Expressions in English
. London: Frances Pinter.
Reidsma, Dennis, and Jean Carletta
2008 “Reliability Measurement without Limits.
” Computational Linguistics
34 (3): 319–326.
Taboada, Maite and Marta Carretero
2012 “Labelling Evaluative Language in English and Spanish: The Case of Attitude in Consumer Reviews.”
Linguistics and the Human Sciences 6: 275–295
.
Wärnsby, Anna
2006
(De)coding Modality. The Case of Must, May, Måste and Kan
. (Lund Studies in English, 113). Lund: Lund University.
Cited by
Cited by 2 other publications
Lavid, Julia, Jorge Arús, Bernard DeClerck & Veronique Hoste
2015.
Creation of a High-quality, Register-diversified Parallel (English-Spanish) Corpus for Linguistic and Computational Investigations.
Procedia - Social and Behavioral Sciences 198
► pp. 249 ff.
This list is based on CrossRef data as of 20 april 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.