Article published in:Analyse Lexicale et Syntaxique: Le système INTEX
Edited by Cédrick Fairon
[Lingvisticæ Investigationes 22:1/2] 1999
► pp. 291–307
Normalisation des textes anglais
The present study deals with the pre-processing of texts. This pre-processing is performed in three steps, which are: the segmentation of the texts into textual units (sentences), the re-writing of contracted forms into a standard form, and the tagging of unambiguous compounds. We describe here two of the three steps: text segmentation, and the re-writing of contracted forms. The segmentation of the texts into textual units is made possible by using the transducer Sentence. The re-writing of contracted forms into their standard forms is done by applying the transducer Normalisation. We describe in detail the various steps involved in the development of both transducers.
Published online: 03 October 2000