Chapter published in:
Crossroads Semantics: Computation, experiment and grammarEdited by Hilke Reckman, Lisa Lai-Shen Cheng, Maarten Hijzelendoorn and Rint Sybesma
[Not in series 210] 2017
► pp. 57–76
Chapter 4How to compare speed and accuracy of syntactic parsers
Gertjan van Noord | University of Groningen
The paper introduces a methodological innovation as well as a practical innovation. Firstly, two scenarios are introduced to compare accurate, but slow parsers on the one hand, with faster, but less accurate parsers on the other hand. Secondly, a corpus-based technique is described to improve the efficiency of wide-coverage high-accuracy parsers. By keeping track of the derivation steps which lead to the best parse for a very large collection of sentences, the parser learns which parse steps can be filtered without significant loss in parsing accuracy, but with an important increase in parsing efficiency. Experimental results with the Alpino parser for Dutch indicate that the technique yields much faster parsers that perform with almost the same level of accuracy. An interesting characteristic of our approach is that it is self-learning, in the sense that it uses unannotated corpora.
Article outline
- 1.Introduction
- 2.Background: The Alpino parser for Dutch
- 3.Methodology: Balancing efficiency and accuracy
- 3.1On-line and off-line parsing scenarios
- 3.1.1On-line scenario
- 3.1.2Off-line scenario
- 3.2Accuracy: Comparing sets of dependencies
- 3.1On-line and off-line parsing scenarios
- 4.Learning efficient parsing
- 4.1Left-corner parsing
- 4.2Left-corner splines
- 4.3Filtering left-corner splines
- 4.3.1Context size
- 4.3.2Required evidence
- 4.4Comparison with link table
- 4.5Implementation detail
- 5.Experimental results
- 5.1Results on Alpino Treebank
- 5.2Effect of the amount of training data
- 5.3Experiment with D-Coi data
- 6.Specializing lexical categories
- 7.Discussion
-
Acknowledgements -
Note -
References
Published online: 12 April 2017
https://doi.org/10.1075/z.210.04van
https://doi.org/10.1075/z.210.04van
References
van der Beek, Leonoor, Gosse Bouma, Robert Malouf & Gertjan van Noord
den Boogaart, P. C. Uit
Hoekstra, Heleen, Michael Moortgat, Bram Renmans, Machteld Schouppe, Ineke Schuurman & Ton van der Wouden
Matsumoto, Y., H. Tanaka, H. Hirakawa, H. Miyoshi & H. Yasukawa
Ninomiya, Takashi, Yoshimasa Tsuruoka, Yusuke Miyao & Jun’ichi Tsujii
van Noord, Gertjan
van Noord, Gertjan & Robert Malouf
2005 Wide coverage parsing with stochastic attribute value grammars. Draft available from the authors. A Preliminary Version of This Paper Was Published In The Proceedings of The Ijcnlp Workshop Beyond Shallow Analyses, Hainan China, 2004.
van Noord, Gertjan, Ineke Schuurman & Vincent Vandeghinste
Ordelman, Roeland, Franciska de Jong, Arjan van Hessen & Hendri Hondorp
Pereira, Fernando C. N. & Stuart M. Shieber
Prins, Robbert
Rayner, Manny & David Carter
Samuelsson, Christer