Chapter 7
Annotation tools for syntax
Kim Gerdes | LPP, Sorbonne Nouvelle University & CNRS, Paris, France
Julie Belião | Modyco, Paris Nanterre University & CNRS, France
Ilaine Wang | Modyco, Paris Nanterre University & CNRS, France
This chapter is devoted to the presentation of the tools and methods used for the different steps of the semi-automatic syntactic annotation: automatic preprocessing; microsyntactic parsing with the FRMG tool, correction of the parsing with the Arborator tool, agreement analysis, post-validation correction, and development of the final format of the Rhapsodie syntactic treebank. As FRMG is a parser for written French that was not configured to analyze disfluencies and reformulation, we used our manual pile marking to unfold the piles and produce a series of simplified “sentences” with only government relations. Despite having two annotators plus a validator for the corrections, we found a substantial number of errors in the post-validation procedure by using a set of rules to determine the well-formedness of the trees.
Article outline
- 1.Introduction
- 2.Parsers for written and spoken French
- 2.1Parsers for French
- 2.2The difficulty of parsing spoken language
- 3.Segmentation and choice of a formalism
- 4.Manual annotation with Pilepilot
- 5.Unfolding-Refolding
- 6.Parsing with FRMG
- 7.Integration of FRMG into Rhapsodie’s annotation process
- 8.Correction with Arborator
- 9.Agreement analysis
- 10.Post-validation correction
- 11.The distributed treebank format
- 12.Conclusion
-
Notes