The Linguistic Annotation of Corpora: The TOSCA Analysis System

Aarts, Jan; van Halteren, Hans; Oostdijk, Nelleke

doi:10.1075/ijcl.3.2.02aar

Article published In:

International Journal of Corpus Linguistics
Vol. 3:2 (1998) ► pp.189–210

The Linguistic Annotation of Corpora

The TOSCA Analysis System

Jan Aarts | Department of Language and Speech, University of Nijmegen

Hans van Halteren | Department of Language and Speech, University of Nijmegen

Nelleke Oostdijk | Department of Language and Speech, University of Nijmegen

The article discusses the role of linguistic annotation in corpus linguistics as opposed to annotation in natural language processing. In corpus linguistics, annotation is an integral part of the process of linguistic interpretation and description of the data. Tagging and parsing are discussed as the automatic counterparts of, respectively, the paradigmatic and the syntagmatic description of corpus data. The requirements for a corpus linguistic annotation system are considered. An account is given of the TOSCA analysis system as representative of such an annotation system. Performance results of the system are given, and an evaluation is made.

Keywords: Corpus Linguistics, Annotation, Tagging, Methodology, Parsing

Published online: 1 January 1998

https://doi.org/10.1075/ijcl.3.2.02aar

Cited by (2)

Cited by two other publications

van Halteren, Hans

2022. Automatic Authorship Investigation. In Language as Evidence, ► pp. 219 ff.

Brants, Thorsten, Wojciech Skut & Hans Uszkoreit

2003. Syntactic Annotation of a German Newspaper Corpus. In Treebanks [Text, Speech and Language Technology, 20], ► pp. 73 ff.

This list is based on CrossRef data as of 5 august 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.