Assessing the impact of automatic dependency annotation on the measurement of phraseological complexity in L2 Dutch

Rubin, Rachel

doi:10.1075/ijlcr.20005.rub

Article published In:

Natural language processing for learner corpus research
Edited by Kristopher Kyle
[International Journal of Learner Corpus Research 7:1] 2021
► pp. 131–162

Assessing the impact of automatic dependency annotation on the measurement of phraseological complexity in L2 Dutch

Rachel Rubin | Vrije Universiteit Brussel | Université catholique de Louvain

The extraction of phraseological units operationalized in phraseological complexity measures (Paquot, 2019) relies on automatic dependency annotations, yet the suitability of annotation tools for learner language is often overlooked. In the present article, two Dutch dependency parsers, Alpino (van Noord, 2006) and Frog (van den Bosch et al., 2007), are evaluated for their performance in automatically annotating three types of dependency relations (verb + direct object, adjectival modifier, and adverbial modifier relations) across three proficiency levels of L2 Dutch. These observations then serve as the basis for an investigation into the impact of automatic dependency annotation on phraseological sophistication measures. Results indicate that both learner proficiency and the type of dependency relation function as moderating factors in parser performance. Phraseological complexity measures computed on the basis of both automatic and manual dependency annotations demonstrate moderate to high correlations, reflecting a moderate to low impact of automatic annotation on subsequent analyses.

Keywords: dependency parsing, L2 Dutch, phraseological complexity, proficiency

Article outline

1.Introduction
2.Dependency parsing learner language
3.Research setup
- 3.1Research questions and objectives
- 3.2Dutch learner data
- 3.3Sampling
4.Dutch Dependency parsers
- 4.1Alpino
- 4.2Frog
- 4.3Compatibility of parser output
5.Gold standard
- 5.1Double coding the gold standard
6.Study 1: Parser accuracy for L2 Dutch
- 6.1Accuracy across proficiency
- 6.4Accuracy across dependency relations
- 6.3A qualitative look at parsing error
7.Study 2: Impact of automatic annotation on measures of phraseological sophistication
- 7.1Computing phraseological sophistication
- 7.2Global impact of automatic annotation on the measurement of phraseological complexity
- 7.3Effect of dependency relations
- 7.4A qualitative look at the impact of parsing error
8.Discussion and conclusions
- 8.1Implications for automatic annotation in LCR
- 8.2Limitations and future directions
Notes
References

Published online: 1 March 2021

https://doi.org/10.1075/ijlcr.20005.rub

References (40)

References

Banerjee, S., & Pedersen, T. (2003). The design, implementation, and use of the Ngram Statistics Package. In Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics.

Bestgen, Y. (2017). Beyond single-word measures: L2 writing assessment, lexical richness and formulaic competence. System, 691, 65–78.

Bouma, G., & Kloosterman, G. (2007). Mining syntactically annotated corpora with XQuery. In Proceedings of the linguistic annotation workshop (pp. 17–24). Stroudsburg: Association for Computational Linguistics.

Boyd, A., & Meurers, D. (2008). Revisiting the impact of different annotation schemes on PCFG parsing: a grammatical dependency evaluation. In Proceedings of the workshop on parsing German (pp. 24–32). Stroudsburg: Association for Computational Linguistics.

Carlsen, C. (2012). Proficiency level–a fuzzy variable in computer learner corpora. Applied Linguistics, 33(2), 161–183.

Council of Europe. (2001). Common European framework of reference for languages: Learning, teaching, assessment. Cambridge, UK: Cambridge University Press.

Daelemans, W., van den Bosch, A., & Weijters, T. (1997). IGTree: using trees for compression and classification in lazy learning algorithms. Artificial Intelligence Review, 11(1), 407–423.

de Marneffe, M.-C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J., & Manning, C. D. (2014). Universal Stanford Dependencies: A cross-linguistic typology. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) (pp. 4585–4592). European Language Resources Association (ELRA)

de Marneffe, M.-C., & Nivre, J. (2019). Dependency grammar. Annual Review of Linguistics, 51, 197–218.

Díaz-Negrillo, A., Meurers, D., Valera, S., & Wunsch, H. (2010). Towards interlanguage POS annotation for effective learner corpora in SLA and FLT. Language Forum, 36(1–2), 139–154.

Dickinson, M., & Ragheb, M. (2009). Dependency annotation for learner corpora. In M. Passarotti, A. Przepiórkowski, S. Raynaud, & F. Van Eynde (Eds.), Proceedings of the eighth international workshop on treebanks and linguistic theories (pp. 59–70). Milan: EDUCatt.

Durrant, P., & Schmitt, N. (2009). To what extent do native and non-native writers make use of collocations? International Review of Applied Linguistics in Language Teaching, 47(2), 157–177.

Granger, S., & Bestgen, Y. (2014). The use of collocations by intermediate vs. advanced non-native writers: A bigram-based study. International Review of Applied Linguistics in Language Teaching, 52(3), 229–252.

Granger, S., & Paquot, M. (2008). Disentangling the phraseological web. In S. Granger & F. Meunier (Eds.), Phraseology: An interdisciplinary perspective (pp. 27–49). Amsterdam, Philadelphia: John Benjamins.

Gries, S. T. (2008). Phraseology and linguistic theory: a brief survey. In S. Granger & F. Meunier (Eds.), Phraseology: An interdisciplinary perspective (pp. 3–26). Amsterdam, Philadelphia: John Benjamins.

Heid, U. (2008). Computational phraseology: an overview. In S. Granger & F. Meunier (Eds.), Phraseology: An interdisciplinary perspective (pp. 337–360). Amsterdam, Philadelphia: John Benjamins.

Housen, A., & Kuiken, F. (2009). Complexity, accuracy, and fluency in second language acquisition. Applied linguistics, 30(4), 461–473.

Huang, Y., Murakami, A., Alexopoulou, T., & Korhonen, A. (2018). Dependency parsing of learner English. International Journal of Corpus Linguistics, 23(1), 28–54.

Krivanek, J., & Meurers, D. (2013). Comparing rule-based and data-driven dependency parsing of learner language. In K. Gerdes, E. Hajičová, & L. Wanner (Eds.), Computational dependency theory (pp. 207-225). Amsterdam: IOS Press.

Lüdeling, A., Walter, M., Kroymann, E., & Adolphs, P. (2005). Multi-level error annotation in learner corpora. In Proceedings of corpus linguistics 2005.

Meurers, D. (2009). On the automatic analysis of learner language: Introduction to the special issue. CALICO Journal, 26(3), 469–473.

Meurers, D., & Dickinson, M. (2017). Evidence and interpretation in language learning research: Opportunities for collaboration with computational linguistics. Language Learning, 67(S1), 66–95.

Meurers, D., & Wunsch, H. (2010). Linguistically annotated learner corpora: Aspects of a layered linguistic encoding and standardized representation. In Proceedings of Linguistic Evidence.

Norris, J. M., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30(4), 555–578.

Ordelman, R. J. F., De Jong, F. M. G., Van Hessen, A. J., & Hondorp, G. H. W. (2007). TwNC: a Multifaceted Dutch News Corpus. ELRA Newsletter, 12(3–4).

Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics 24(4), 492–518.

Ott, N., & Ziai, R. (2010). Evaluating dependency parsing performance on German learner language. In M. Dickinson, K. Müürisep, & M. Passarotti (Eds.), Proceedings of the ninth international workshop on treebanks and linguistic theories Vol. 9 (pp. 175–186). Northern European Association for Language Technology (NEALT).

Paquot, M. (2018). Phraseological competence: A missing component in university entrance language tests? insights from a study of EFL learners’ use of statistical collocations. Language Assessment Quarterly, 15(1), 29–43.

(2019). The phraseological dimension in interlanguage complexity research. Second Language Research, 35(1), 121–145.

R Core Team. (2017). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from [URL]

Ragheb, M., & Dickinson, M. (2012). Defining syntax for learner language annotation. In M. Kay & C. Boitet (Eds.), Proceedings of COLING 2012 (pp. 965–974).

Rubin, Housen, & Paquot (in press). Phraseological complexity as an index of L2 Dutch writing proficiency: A partial replication study. In S. Granger (Ed.), Perspectives on the Second Language Phrasicon: The View from Learner Corpora. Bristol: Multilingual Matters.

Sharwood Smith, M. & Truscott, J. (2005). Stages or Continua in Second Language Acquisition: A MOGUL Solution. Applied Linguistics, 26(2), 219–240.

Tsarfaty, R., Nivre, J., & Andersson, E. (2011). Evaluating dependency parsing: Robust and heuristics-free cross-annotation evaluation. In Proceedings of the 2011 conference on empirical methods in natural language processing (pp. 385–396). Stroudsburg: Association for Computational Linguistics.

van den Bosch, A., Busser, B., Canisius, S., & Daelemans, W. (2007). An efficient memory-based morphosyntactic tagger and parser for Dutch. In P. Dirix, I. Schuurman, V. Vandeghinste, & F. Van Eynde (Eds.), Proceedings of the 17th meeting of Computational Linguistics in the Netherlands (pp. 191–206).

van der Beek, L., Bouma, G., Malouf, R., & van Noord, G. (2002). The Alpino dependency treebank. In Computational linguistics in the Netherlands 2001 (pp. 8–22).

van Noord, G. (2006). At last parsing is now operational. In TALN 2006 (pp. 20–42).

van Noord, G., Schuurman, I., & Bouma, G. (2011). Lassy Syntactische Annotatie, Revision 19455. Retrieved from [URL]

van Noord, G., Schuurman, I., & Vandeghinste, V. (2006). Syntactic annotation of large corpora in STEVIN. In Proceedings of the fifth international conference on language resources and evaluation (LREC’06). European Language Resources Association (ELRA).

Weiss, Z. & Meurers, D. (this issue). Analyzing the linguistic complexity of German learner language in a reading comprehension task: Using proficiency classification to investigate short answer data, the impact of linguistic analysis quality, and cross-data generalizability. International Journal of Learner Corpus Research, Special Issue on NLP.

Cited by (1)

Cited by one other publication

Kyle, Kristopher & Masaki Eguchi

2024. Evaluating NLP models with written and spoken L2 samples. Research Methods in Applied Linguistics 3:2 ► pp. 100120 ff.

This list is based on CrossRef data as of 17 october 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.