Materials & Methods Report
fsca
French syntactic complexity analyzer [*] *
This article reports on an open-source R package for the extraction of syntactic units from dependency-parsed
French texts. To evaluate the reliability of the package, syntactic units were extracted from a corpus of L2 French and were
compared to units extracted manually from the same corpus. The f-score of the extracted units ranged from 0.53–0.97. Although
units were not always identical between the two methods, manual and automatically-derived syntactic complexity measures were
strongly and significantly correlated (ρ = 0.62–0.97, p < 0.001), suggesting that this
package may be a suitable replacement for manual annotation in some cases where manual annotation is not possible but that care
should be used in interpreting the measures based on these units.
Article outline
- 1.Introduction
- 2.Methodology
- 2.1Manual annotation
- 2.2Automatic extraction of syntactic units
- 3.Results
- 3.1Precision and recall of automatically identified units
- 3.2Correlation between manual and automatic methods
- 3.3Sources of error
- 4.Discussion and conclusion
- Disclosures
- Acknowledgements
- Notes
-
References
References (34)
Abeillé, A., & Barrier, N.
(
2004)
Enriching
a French treebank. In
Proceedings of the Fourth International
Conference on Language Resources and Evaluations (LREC
’04), 2233–2236.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Benevento, C., & Storch, N.
(
2011)
Investigating
writing development in secondary school learners of French.
Assessing
Writing, 16(2), 97–110.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Bernardini, P., & Granfeldt, J.
(
2019)
On
cross-linguistic variation and measures of linguistic complexity in learner texts: Italian, French and
English.
International Journal of Applied
Linguistics, 29(2), 211–232.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Brown, J. D.
(
2014)
Classical
theory reliability. In
A. J. Kunnen (Ed.),
The
companion to language
assessment (pp. 1165–1181). Oxford: Wiley-Blackwell.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Candito, M., Nivre, J., Denis, P., & Anguiano, E. H.
(
2010)
Benchmarking
of statistical dependency parsers for French. In
Proceedings of the
23rd International Conference on Computational Linguistics (COLING 2010: Poster
Volume), 108–116.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Council of Europe
(
2001)
The common
european framework of reference for languages: Learning, teaching,
assessment. Cambridge: Cambridge University Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Csardi, G., & Nepusz, T.
(
2006)
The
igraph software package for complex network research.
InterJournal (Complex
Systems) 1695
[URL]
De Clercq, B., & Housen, A.
(
2017)
A
cross-linguistic perspective on syntactic complexity in L2 development: Syntactic elaboration and
diversity.
The Modern Language Journal, 101(2), 315–334.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Demol, A., & Hadermann, P.
(
2008)
An
exploratory study of discourse organisation in French L1, Dutch L1, French L2 and Dutch L2 written
narratives. In
G. Gilquin,
S. Papp, &
M. B. Díez-Bedmar (Eds.),
Linking
up contrastive and learner corpus
research (pp. 255–282). Amsterdam: Brill.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Denis, P., & Sagot, B.
(
2012)
Coupling
an annotated corpus and a lexicon for state-of-the-art POS tagging.
Language Resources and
Evaluation, 461, 721–736.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Garretson, G.
(
2011)
Dexter
coder. Retrieved from
[URL]
Gyllstad, H., Granfeldt, J., Bernardini, P., & Källkvist, M.
Henry, L., & Wickham, H.
(
2020)
purrr:
Functional programming tools. Retrieved from
[URL]
Honnibal, M., & Montani, I.
(
2017)
spaCy
2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental
parsing.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Klein, D., & Manning, C.
(
2003)
Fast
exact inference with a factored model for natural language
parsing. In
S. Becker,
S. Thrun, &
K. Obermayer (Eds.),
Advances
in neural information processing
systems 151 (pp. 3–10). Cambridge, MA: The MIT Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kuiken, F., & Vedder, I.
(
2008)
Cognitive
task complexity and written output in Italian and French as a foreign language.
Journal of
Second Language
Writing, 17(1), 48–60.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kyle, K., & Crossley, S. A.
(
2018)
Measuring
syntactic complexity in L2 writing using fine-grained clausal and phrasal indices.
The Modern Language Journal, 102(2), 333–349.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Landis, J. R., & Koch, G. G.
(
1977)
The
measurement of observer agreement for categorical
data.
Biometrics, 33(1), 159–174.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Lu, X.
(
2011)
A
corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language
development.
TESOL
Quarterly, 45(1), 36–62.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Nivre, J., Hall, J., & Nilsson, J.
(
2006)
MaltParser:
A data-driven parser-generator for dependency parsing. In
Proceedings
of the Fifth International Conference on Language Resources and Evaluation (LREC
2006), 2216–2219.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Norris, J. M., & Ortega, L.
(
2009)
Towards
an organic approach to investigating CAF in instructed SLA: The case of complexity.
Applied
Linguistics, 30(4), 555–578.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Ortega, L.
(
2003)
Syntactic
complexity measures and their relationship to L2 proficiency: A research synthesis of college level L2
writing.
Applied
Linguistics, 24(4), 492–518.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Plonsky, L., & Derrick, D. J.
(
2016)
A
meta-analysis of reliability coefficients in second language research.
The Modern Language Journal, 100(2), 538–553.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Plonsky, L., & Oswald, F. L.
(
2014)
How
big is “big”? Interpreting effect sizes in L2 research.
Language
Learning, 64(4), 878–912.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
R Core Team
(
2019)
R: A language
and environment for statistical computing. Retrieved from
[URL]
RStudio Team
(
2018)
RStudio:
Integrated Development Environment for R. Retrieved from
[URL]
Scott, W. A.
(
1955)
Reliability
of content analysis: The case of nominal scale coding.
The Public Opinion
Quarterly, 19(3), 321–325.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Shrout, P. E.
(
1998)
Measurement
reliability and agreement in psychiatry.
Statistical Methods in Medical
Research, 7(3), 301–317.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Vanderbauwhede, G.
(
2012)
Le déterminant démonstratif en français et en néerlandais à travers les corpus: Théorie, description,
acquisition (Unpublished doctoral
dissertation). Katholieke Universiteit Leuven, Leuven, Belgium; Université Paris Ouest Nanterre La Défense, Paris, France.
Vandeweerd, N., Housen, A., & Paquot, M.
Way, D. P., Joiner, E. G., & Seaman, M. A.
(
2000)
Writing
in the secondary foreign language classroom: The effects of prompts and tasks on novice learners of
French.
The Modern Language Journal, 84(2), 171–184.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Wolfe-Quintero, K., Inagaki, S., & Kim, H.-Y.
(
1998)
Second
language development in writing: Measures of fluency, accuracy & complexity. Honolulu, HI: Second Language Teaching & Curriculum Center.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cited by (3)
Cited by 3 other publications
Loignon, Guillaume
2021.
ILSA: an automated language complexity analysis tool for French.
Mesure et évaluation en éducation 44:spécial
► pp. 61 ff.
![DOI logo](//benjamins.com/logos/doi-logo.svg)
Loignon, Guillaume
2021.
ALSI : un nouvel outil d’analyse automatisée de la complexité linguistique pour le français québécois.
Mesure et évaluation en éducation 44:3
► pp. 29 ff.
![DOI logo](//benjamins.com/logos/doi-logo.svg)
Vandeweerd, Nathan, Alex Housen & Magali Paquot
This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.