fsca : French syntactic complexity analyzer*

Vandeweerd, Nathan

doi:10.1075/ijlcr.20018.van

Article published In:

International Journal of Learner Corpus Research
Vol. 7:2 (2021) ► pp.259–274

Materials & Methods Report

fsca

French syntactic complexity analyzer []

Nathan Vandeweerd | Université catholique de Louvain | Vrije Universiteit Brussel

This article reports on an open-source R package for the extraction of syntactic units from dependency-parsed French texts. To evaluate the reliability of the package, syntactic units were extracted from a corpus of L2 French and were compared to units extracted manually from the same corpus. The f-score of the extracted units ranged from 0.53–0.97. Although units were not always identical between the two methods, manual and automatically-derived syntactic complexity measures were strongly and significantly correlated (ρ = 0.62–0.97, p < 0.001), suggesting that this package may be a suitable replacement for manual annotation in some cases where manual annotation is not possible but that care should be used in interpreting the measures based on these units.

Keywords: L2 French, dependency grammar, syntactic complexity, automatic annotation, open-source R package

Article outline

1.Introduction
2.Methodology
- 2.1Manual annotation
- 2.2Automatic extraction of syntactic units
3.Results
- 3.1Precision and recall of automatically identified units
- 3.2Correlation between manual and automatic methods
- 3.3Sources of error
4.Discussion and conclusion
Disclosures
Acknowledgements
Notes
References

Published online: 11 October 2021

https://doi.org/10.1075/ijlcr.20018.van

References (34)

Abeillé, A., & Barrier, N.

(2004) Enriching a French treebank. In Proceedings of the Fourth International Conference on Language Resources and Evaluations (LREC ’04), 2233–2236.

Benevento, C., & Storch, N.

(2011) Investigating writing development in secondary school learners of French. Assessing Writing, 16(2), 97–110.

Bernardini, P., & Granfeldt, J.

(2019) On cross-linguistic variation and measures of linguistic complexity in learner texts: Italian, French and English. International Journal of Applied Linguistics, 29(2), 211–232.

Brown, J. D.

(2014) Classical theory reliability. In A. J. Kunnen (Ed.), The companion to language assessment (pp. 1165–1181). Oxford: Wiley-Blackwell.

Candito, M., Nivre, J., Denis, P., & Anguiano, E. H.

(2010) Benchmarking of statistical dependency parsers for French. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010: Poster Volume), 108–116.

Council of Europe

(2001) The common european framework of reference for languages: Learning, teaching, assessment. Cambridge: Cambridge University Press.

Csardi, G., & Nepusz, T.

(2006) The igraph software package for complex network research. InterJournal (Complex Systems) 1695 [URL]

De Clercq, B., & Housen, A.

(2017) A cross-linguistic perspective on syntactic complexity in L2 development: Syntactic elaboration and diversity. The Modern Language Journal, 101(2), 315–334.

Demol, A., & Hadermann, P.

(2008) An exploratory study of discourse organisation in French L1, Dutch L1, French L2 and Dutch L2 written narratives. In G. Gilquin, S. Papp, & M. B. Díez-Bedmar (Eds.), Linking up contrastive and learner corpus research (pp. 255–282). Amsterdam: Brill.

Denis, P., & Sagot, B.

(2012) Coupling an annotated corpus and a lexicon for state-of-the-art POS tagging. Language Resources and Evaluation, 461, 721–736.

Garretson, G.

(2011) Dexter coder. Retrieved from [URL]

Gyllstad, H., Granfeldt, J., Bernardini, P., & Källkvist, M.

(2014) Linguistic correlates to communicative proficiency levels of the CEFR: The case of syntactic complexity in written L2 English, L3 French and L4 Italian. EuroSLA Yearbook, 14(1), 1–30.

Henry, L., & Wickham, H.

(2020) purrr: Functional programming tools. Retrieved from [URL]

Honnibal, M., & Montani, I.

(2017) spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.

Klein, D., & Manning, C.

(2003) Fast exact inference with a factored model for natural language parsing. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems 151 (pp. 3–10). Cambridge, MA: The MIT Press.

Kuiken, F., & Vedder, I.

(2008) Cognitive task complexity and written output in Italian and French as a foreign language. Journal of Second Language Writing, 17(1), 48–60.

Kyle, K.

(2021) (Ed.) Natural language processing for learner corpus research [Special issue]. International Journal of Learner Corpus Research 7(1).

Kyle, K., & Crossley, S. A.

(2018) Measuring syntactic complexity in L2 writing using fine-grained clausal and phrasal indices. The Modern Language Journal, 102(2), 333–349.

Landis, J. R., & Koch, G. G.

(1977) The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.

Lu, X.

(2010) Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474–496.

(2011) A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development. TESOL Quarterly, 45(1), 36–62.

Nivre, J., Hall, J., & Nilsson, J.

(2006) MaltParser: A data-driven parser-generator for dependency parsing. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006), 2216–2219.

Norris, J. M., & Ortega, L.

(2009) Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30(4), 555–578.

Ortega, L.

(2003) Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college level L2 writing. Applied Linguistics, 24(4), 492–518.

Plonsky, L., & Derrick, D. J.

(2016) A meta-analysis of reliability coefficients in second language research. The Modern Language Journal, 100(2), 538–553.

Plonsky, L., & Oswald, F. L.

(2014) How big is “big”? Interpreting effect sizes in L2 research. Language Learning, 64(4), 878–912.

R Core Team

(2019) R: A language and environment for statistical computing. Retrieved from [URL]

RStudio Team

(2018) RStudio: Integrated Development Environment for R. Retrieved from [URL]

Scott, W. A.

(1955) Reliability of content analysis: The case of nominal scale coding. The Public Opinion Quarterly, 19(3), 321–325.

Shrout, P. E.

(1998) Measurement reliability and agreement in psychiatry. Statistical Methods in Medical Research, 7(3), 301–317.

Vanderbauwhede, G.

(2012) Le déterminant démonstratif en français et en néerlandais à travers les corpus: Théorie, description, acquisition (Unpublished doctoral dissertation). Katholieke Universiteit Leuven, Leuven, Belgium; Université Paris Ouest Nanterre La Défense, Paris, France.

Vandeweerd, N., Housen, A., & Paquot, M.

(2021) Applying phraseological complexity measures to L2 French: A partial replication study. International Journal of Learner Corpus Research, 7(2), 197–229.

Way, D. P., Joiner, E. G., & Seaman, M. A.

(2000) Writing in the secondary foreign language classroom: The effects of prompts and tasks on novice learners of French. The Modern Language Journal, 84(2), 171–184.

Wolfe-Quintero, K., Inagaki, S., & Kim, H.-Y.

(1998) Second language development in writing: Measures of fluency, accuracy & complexity. Honolulu, HI: Second Language Teaching & Curriculum Center.

Cited by (3)

Cited by 3 other publications

Loignon, Guillaume

2021. ILSA: an automated language complexity analysis tool for French. Mesure et évaluation en éducation 44:spécial ► pp. 61 ff.

Loignon, Guillaume

2021. ALSI : un nouvel outil d’analyse automatisée de la complexité linguistique pour le français québécois. Mesure et évaluation en éducation 44:3 ► pp. 29 ff.

Vandeweerd, Nathan, Alex Housen & Magali Paquot

2021. Applying phraseological complexity measures to L2 French. International Journal of Learner Corpus Research 7:2 ► pp. 197 ff.

This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.

fsca

French syntactic complexity analyzer [*] *

Cited by 3 other publications

French syntactic complexity analyzer []