INTEX pour l’annotation semi-automatique d’un corpus d’anaphores

Tutin, Agnès

doi:10.1075/li.22.1-2.11tut

Article published In:

Analyse Lexicale et Syntaxique: Le système INTEX
Edited by Cédrick Fairon
[Lingvisticæ Investigationes 22:1/2] 1999
► pp. 173–189

INTEX pour l’annotation semi-automatique d’un corpus d’anaphores

Agnès Tutin | Équipe CRISTAL-GRESEC, Université Stendhal - Grenoble 3

Anaphors constitute a well-known problem in automatic text generation and natural language understanding. Using corpora to deal with such phenomena could help to develop robust processing techniques. Building such resources is, though, a tedious and time-consuming task and could more easily be accomplished by partial automation.

In this paper, we show how the intex system can be used for this task. We show that in a newspaper corpus (in this case, le Monde Diplomatique), discursive grammatical anaphors can easily be located via associated linguistic features. A series of transducers generating tags for categories and functions can thus be built, and constitutes an efficient pre-processing stage (though manual checking remains necessary). The heuristics, quickly and easily developed, are specific to the task. The study goes on to show, however, that discarding non-anaphoric pronouns is not straightforward in the case of non-referential personal pronouns or indefinite pronouns, and that the tagging of the grammatical function seems limited in the absence of real syntactic processing.

Article language: French

Published online: 3 October 2000

https://doi.org/10.1075/li.22.1-2.11tut