Article published in:Analyse Lexicale et Syntaxique: Le système INTEX
Edited by Cédrick Fairon
[Lingvisticæ Investigationes 22:1/2] 1999
► pp. 173–189
INTEX pour l’annotation semi-automatique d’un corpus d’anaphores
Anaphors constitute a well-known problem in automatic text generation and natural language understanding. Using corpora to deal with such phenomena could help to develop robust processing techniques. Building such resources is, though, a tedious and time-consuming task and could more easily be accomplished by partial automation. In this paper, we show how the intex system can be used for this task. We show that in a newspaper corpus (in this case, le Monde Diplomatique), discursive grammatical anaphors can easily be located via associated linguistic features. A series of transducers generating tags for categories and functions can thus be built, and constitutes an efficient pre-processing stage (though manual checking remains necessary). The heuristics, quickly and easily developed, are specific to the task. The study goes on to show, however, that discarding non-anaphoric pronouns is not straightforward in the case of non-referential personal pronouns or indefinite pronouns, and that the tagging of the grammatical function seems limited in the absence of real syntactic processing.
Published online: 03 October 2000