Creating a dependency syntactic treebank
Towards intuitive language modeling
We present a user-centered approach for defining the dependency syntactic specification for a treebank. We show that by collecting information on syntactic interpretations from the future users of the treebank, we can model so far dependency-syntactically undefined syntactic structures in a way that corresponds to the users’ intuition. By consulting the users at the grammar definition phase we aim at better usage of the treebank in the future. We focus on two complex syntactic phenomena: elliptical comparative clauses and participial NPs or NPs with a verb-derived noun as their head. We show how the phenomena can be interpreted in several ways and ask for the users’ intuitive way of modeling them. The results aid in constructing the syntactic specification for the treebank.
References (12)
References
Didriksen, T. 2011. Constraint Grammar Manual: 3rd Version of the CG Formalism Variant. GrammarSoft ApS. <[URL]>
Hakulinen, A., Vilkuna, M., Korhonen, R., Koivisto, V., Heinonen, T.-R. & Alho, I. 2004. Iso suomen kielioppi, Helsinki: Suoma laisen Kirjallisuuden Seura.
Hakulinen, A., Vilkuna, M., Korhonen, R., Koivisto, V., Heinonen, T.-R. & Alho, I. 2004. Ison suomen kieliopin verkkoversio: Määritel-mät. Helsinki: Suomalaisen Kirjallisuuden Seura. [URL]
Haverinen, K., Ginter, F., Laippala, V. & Salakoski, T. 2009. Clinical Finnish Parser Demo. [URL]
Haverinen, K., Viljanen, T., Laippala, V., Kohonen, S., Ginter, F. & Salakoski, T. 2010. Treebanking Finnish. In Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories (TLT9), M. Dickinson, K. Müürisep & M. Passarotti (eds), 79–90. Tartu: University of Tartu.
Karlsson, F., Voutilainen, A., Heikkilä, J. & Anttila, A. 1995. Constraint Grammar: A Language-Independent System for Parsing Running Text [Natural Language Processing Series 4]. Berlin: Mouton de Gruyter.
Kübler, S. & Prokić, J. 2006. Why is German dependency parsing more reliable than constituent parsing. In
Proceedings of the Fifth Workshop on Treebanks and Linguistic Theories (TLT5)
, 7–18. Prague.
Marcus, M., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K. & Schasberger, B. 1994. The Penn Treebank: Annotating predicate argument structure. In Proceedings of the Workshop on Human Language Technology, C.J. Weinstein (ed.), 114–119. Burlington MA: Morgan Kaufmann.
Munro, R., Bethard, S., Kuperman, S., Tzuyin Lai, V., Melnick, R., Potts, C., Schnoebelen, T. & Tily, H. 2010. Crowdsourcing and language studies: The new generation of linguistic data. In
Proceedings of the NAACL HLT 2010. Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk
, 122–130. Los Angeles CA.
Tesnière, L. 1980. Grundzüge der strukturalen Syntax. Stuttgart: Kl-ettCotta.
Voutilainen, A. & Lindén, K. 2011. Designing a dependency representation and grammar definition corpus for Finnish. In
Proceedings of III Congreso Internacional de Lingüística de Corpus (CILC 2011)
. Valencia, Spain.
Voutilainen, A., Purtonen, T., Leisko-Järvinen, S., Kumlander, M. & Muhonen, K. 2012. Finnish Grammar Corpus and Dependency Syntax Description, University of Helsinki, Department of Modern Languages. [URL]