The Copenhagen Dependency Treebank (CDT)
Extending syntactic annotation to other linguistic levels
The objective of this paper is to provide an overview of the CDT annotation design with special emphasis on the modelling of the interface between the syntactic level and two other linguistic levels, viz. morphology and discourse. In connection with the description of NP annotation we present the fundamentals of how CDT is marked up with semantic relations in accordance with the dependency principles governing the annotation on the other levels of CDT. Specifically, focus will be on how Generative Lexicon (GL) theory has been incorporated into the unitary theoretical dependency framework of CDT. An annotation scheme for lexical semantics has been designed so as to account for the lexico-semantic structure of complex NPs, and the four GL qualia also appear in some of the CDT discourse relation labels as a description of parallel semantic relations at this level.
References
Böhmová, A., Hajič, J., Hajičová, E. & Hladká, B
2003 The Prague Dependency Treebank: A three-level annotation scenario. In
Treebanks: Building and Using Parsed Corpora,
A. Abeillé (ed.). Dordrecht: Kluwer.
Buch-Kromann, M
2006 Discontinuous Grammar. A Dependency-based Model of Human Parsing and Language Learning. Doctoral. dissertation, Copenhagen Business School.
Buch-Kromann, M., Gylling, M., Knudsen, L.J., Korzen, I. & Müller, H.H
2010 The inventory of linguistic relations used in the Copenhagen Dependency Treebanks. Technical report. Copenhagen: Copenhagen Business School.
[URL].
Buch-Kromann, M., Hardt, D. & Korzen, I
2011 Syntax-centered and semantics-centered views of discourse. Can they be reconciled? In
Beyond Semantics. Corpus-based Investigations of Pragmatic and Discourse Phenomena,
S. Dipper &
H. Zinsmeister (eds), 17–30. Bochum: Ruhr-Universität Bochum, Sprachwissenschaftliches Institut. [
Bochumer Linguistische Arbeitsberichte, vol. 3].
Buch-Kromann, M., Korzen, I. & Müller, H.H
2009 Uncovering the ‘lost’ structure of translations with parallel treebanks. In
Methodology, Technology and Innovation in Translation Process Research,
I.M. Mees,
F. Alves &
S. Göpferich (eds), 199–224. Copenhagen: Samfundslitteratur. [
Copenhagen Studies in Language 38].
Carlson, L., Marcu, D. & Okurowski, M.E
2001 Building a discourse-tagged corpus in the framework of rhetorical structure theory. In
Proceedings of the 2nd SIGdial Workshop on Discourse and Dialogue
.
Dinesh, N., Lee, A., Miltsakaki, E., Prasad, R., Joshi, A. & Webber, B
2005 Attribution and the (non-)alignment of syntactic and discourse arguments of connectives. In
Proceedings of the Workshop on Frontiers in Corpus Annotation, II:
Pie in the Sky
, 29–36.
Hardt, D
2013 A uniform syntax and discourse structure: The Copenhagen Dependency Treebanks.
Dialogue and Discourse 4(2): 53–64.
Hinrichs, E., Kubler, S., Naumann, K., Telljohann, H. & Trushkina, J
2004 Recent developments in linguistic annotations of the TuBa-D/Z Treebank. In
Proceedings of the Third Workshop on Treebanks and Linguistic Theories
, 51–62. Tübingen, Germany.
Johnston, M. & Busa, F
1999 The compositional interpretation of compounds. In
Breadth and Depth of Semantics Lexicons,
E. Viegas (ed.), 167–87. Dordrecht: Kluwer.
Keson, B. & Norling-Christensen, O
1998 PAROLE-DK. The Danish Society for Language and Literature.
Korzen, I
2007 Linguistic typology, text structure and appositions. In
Langues d’Europe, l’Europe des langues. Croisements linguistiques,
I. Korzen,
M. Lambert &
H. Vassiliadou (eds).
Scolia 22: 21–42.
Korzen, I
2009 Struttura testuale e anafora evolutiva: tipologia romanza e tipologia germanica. In
Lingue, culture e testi istituzionali,
I. Korzen &
C. Lavinio (eds), 33–60. Firenze: Franco Cesati.
Korzen, I. & Buch-Kromann, M
2011 Anaphoric relations in the Copenhagen Dependency Treebanks. In
Beyond Semantics. Corpus-based Investigations of Pragmatic and Discourse Phenomena,
S. Dipper &
H. Zinsmeister (eds), 83–98. Bochum: Ruhr-Universität Bochum, Sprachwissenschaftliches Institut. [
Bochumer Linguistische Arbeitsberichte, vol. 3].
Kromann, M.T
2003 The Danish Dependency Treebank and the DTAG treebank tool. In
Proceedings of the Second Workshop on Treebanks and Linguistic Theories (TLT 2003)
, 14–15 November, Växjö, 217–220.
Lundquist, L
1985 Coherence: From structures to processes. In
Text Connexity,
Text Coherence,
E. Sözer (ed.), 151–175. Hamburg: Helmut Buske.
Mann, W.C. & Thompson, S.A
1987 Rhetorical Structure Theory. A Theory of Text Organization [RS-87-190], 1–81. Los Angeles CA: ISI.
Marcu, D
2003 Discourse Structures: Trees or Graphs? [URL].
Marcus, M.P., Marcinkiewicz, M.A. & Santorini, B
1993 Building a large annotated corpus of English: The Penn Treebank.
Computational Linguistics 19(2): 313–330.
Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B. & Grishman, R
2004a The NomBank Project: An interim report. In
Proceedings of the HLTNAACL Workshop on Frontiers in Corpus Annotation, 24–31. Boston MA.
Meyers, A. et al.
2004b Annotating noun argument structure for NomBank. In
Proceedings of the 4th International Conference on Language Resources and Evaluation (LCREC 2004)
. Lisbon, Portugal.
Mladová, L., Zikánová, Š. & Hajičová, E
2008 From sentence to discourse: Building an annotation scheme for discourse based on Prague Dependency Treebank. In
Proceedings of the 6th International Conference on Language Resources and Evaluation (LCREC 2008)
, 2564–2570. Marrakesh, Morocco.
Müller, H.H
2010 Annotation of morphology and NP structure in the Copenhagen Dependency Treebanks. In
Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories [
NEALT Proceedings Series],
M. Dickinson,
K. Müürisep &
M. Passarotti (eds), 151–162. Tartu: University of Tartu.
Palmer, M., Gildea, D. & Kingsbury, P
2005 The proposition bank: An annotated corpus of semantic roles.
Computational Linguistics 31(1): 71–106.
Poesio, M
2004 Discourse annotation and semantic annotation in the GNOME corpus. In
Proceedings of the ACL Workshop on Discourse Annotation
. Barcelona, Spain.
Prasad, R., Dinesh, N., Lee, A., Joshi, A. & Webber, B
2006 Attribution and its annotation in the Penn Discourse TreeBank.
TAL (Traitement Automatique des Langues) 47(2): 43–64.
Prasad, R., Miltsakaki, E., Dinesh, A., Lee, A., Joshi, A., Robaldo, L. & Webber, B
2008a The Penn Discourse Treebank 2.0. Annotation Manual.
[IRCS Technical Report IRCS-08-01]. Philadelphia PA: University of Pennsylvania, Institute for Research in Cognitive Science.
Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A. & Webber, B
2008b The Penn Discourse TreeBank 2.0. In
Proceedings of the Sixth International Language Resources and Evaluation (LREC’08)
. Marrakesh, Morocco.
Pustejovsky, J
1995 The Generative Lexicon. Cambridge MA: The MIT Press.
Pustejovsky, J
2001 Generativity and explanation in semantics: A reply to Fodor and Lepore. In
The Language of Word Meaning,
P. Bouillon &
F. Busa (eds), 51–74. Cambridge: CUP.
Rainer, F
1999 La derivación adjectival. In
Gramática Descriptiva de la Lengua Española,
I. Bosque &
V. Demonte (eds), 4595–4643. Madrid: Espasa Calpe.
Ramm, W. & Fabricius-Hansen, C
2005 Coordination and Discourse-structural Salience from a Cross-linguistic Perspective [SPRIKreports 30]. Oslo: Universitetet i Oslo.
Ruppenhofer, J., Ellsworth, M., Petruck, M., Johnson, C. & Scheffczyk, J
2006 FrameNet, II: Extended Theory and Practice.
[URL]
Stede, M
2008 Disambiguating rhetorical structure.
Research on Language and Computation 6: 311–332.
Taboada, M. & Mann, W.C
2006 Rhetorical structure theory: Looking back and moving ahead.
Discourse Studies 8: 423–459.
Varela, S. & Martín García, J
1999 La prefijación. In
Gramática Descriptiva de la Lengua Española,
I. Bosque &
V. Demonte (eds), 4993–5040. Madrid: Espasa Calpe.
Webber, B
2004 D-LTAG: Extending lexicalized TAG to discourse.
Cognitive Science 28: 751–779.
Wolf, F. & Gibson, E
2005 Representing discourse coherence: A corpus-based study.
Computational Linguistics 31(2): 249–287.
Cited by
Cited by 1 other publications
This list is based on CrossRef data as of 12 april 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.