BasiScript
A corpus of contemporary Dutch texts written by primary school children
This short paper introduces BasiScript, a 9-million-word corpus of contemporary Dutch texts written by primary
school children. The data were collected over three years with 17,216 children contributing texts throughout this period. Each
word token in the corpus is annotated with the correct orthographical form, the associated lemma and the part of speech. The most
frequent polysemous words have been annotated for word meaning, while all words in the lexicon that was derived from the
BasiScript corpus have been annotated for corpus and subcorpora frequency, dispersion, length, family size, family frequency,
orthographic neighborhood size, and orthographic neighborhood frequency. Images of the texts are available to researchers. The
present article describes the corpus and presents a comparison of BasiScript with BasiLex (a Dutch corpus with texts primary
school children are likely to read, completed in 2015) by means of frequency profiling.
Article outline
- 1.Introduction
- 2.BasiScript
- 2.1Data collection
- 2.2Format
- 2.3Annotation
- 3.BasiScript versus BasiLex: Frequency profiling
- 3.1Method
- 3.2Differences in the frequencies of function words
- 4.Conclusion
- Acknowledgements
-
References
References (17)
Balota, D., Yap, M., & Cortese, M. J.
(
2006)
Visual word recognition. In
M. J. Traxler &
M. A. Gernsbacher (Eds.),
Handbook of Psycholinguistics (pp. 285–376). Amsterdam: Elsevier Academic Press.
Bracken, S., & Fischel, J. E.
(
2008)
Family reading behaviour and literacy skills in preschool children from low-income backgrounds.
Early Education and Development, 19(1), 45–67.
Chiu, S. I., Hong, F. Y., & Hu, H. Y.
(
2015)
The effects of family cultural capital and reading motivation on reading behaviour in elementary school students.
School Psychology International, 36(1), 3–17.
Clark, C., & Teravainen, A.
(
2017)
Book Ownership and Reading Outcomes. London: National Literacy Trust.
Drijbooms, E., Groen, M., & Verhoeven, L.
(
2017)
How executive functions predict development in syntactic complexity of narrative writing in the upper elementary grades.
Reading & Writing, 30(1), 209–231.
Evers-Vermeul, J., & Sanders, T.
(
2009)
The emergence of Dutch connectives; how cumulative cognitive complexity explains the order of acquisition.
Journal of Child Language, 36(4), 829–854.
Johannes, K., Wilson, C., & Landau, B.
(
2016)
The importance of lexical verbs in the acquisition of spatial prepositions: The case of in and on
.
Cognition, 1571, 174–189.
Kent, S., & Wanzek, J.
(
2016)
The relationship between component skills and writing quality and production across developmental levels: A meta-analysis of the last 25 years.
Review of Educational Research, 86(2), 570–601.
Meints, K., Plunkett, K., Harris, P. L., & Dimmock, D.
(
2002)
What is ‘on’ and ‘under’ for 15-, 18-, and 24-month-olds? Typicality effects in early comprehension of spatial prepositions.
British Journal of Developmental Psychology, 20(1), 113–130.
Penning de Vries, B., & Tellings, A.
forthcoming).
Development of connective frequency in Dutch child-directed texts: a corpus analysis.
Perfetti, C. A., & Hart, L.
(
2001)
The lexical quality hypothesis. In
L. Verhoeven,
C. Elbro, &
P. Reitsma (Eds.),
Precursors of Functional Literacy (pp. 189–214). Amsterdam/Philadelphia, PA: John Benjamins.
Peterson, C., & McCabe, A.
(
1987)
The connective “and”: Do older children use it less as they learn other connectives? Journal of Child Language, 14(2), 375–381.
Rayson, P., & Garside, R.
(
2000)
Comparing Corpora using Frequency Profiling. In
Proceedings of the workshop on Comparing Corpora, 38th annual meeting of the Association for Computational Linguistics (ACL 2000), 1–6. Hong Kong.
Tellings, A., Hulsbosch, M., Vermeer, A., & van den Bosch, A.
(
2014)
BasiLex: An 11.5 million word corpus of Dutch texts written for children.
Computational Linguistics in the Netherlands, 41, 191–208.
Van den Bosch, A., Busser, G. J., Daelemans, W., & Canisius, S.
(
2007)
An efficient memory-based morphosyntactic tagger and parser for Dutch. In
F. van Eynde,
P. Dirix,
I. Schuurman, &
V. Vandeghinste (Eds.),
Selected Papers of the 17th Computational Linguistics in the Netherlands Meeting (CLIN-17, Leuven), (pp. 99–114). Utrecht: LOT. Retrieved from
[URL] (last accessed September 2018).
Van Gompel, M.
(
2014)
FoLiA: Format for linguistic annotation.
Documentation, Technical Report Language and Speech Technology Technical Report Series LST-14-01, Radboud University Nijmegen.
Cited by (1)
Cited by 1 other publications
[no author supplied]
2022.
List of Example Stand-alone Corpus Description Articles. In
Designing and Evaluating Language Corpora,
► pp. 224 ff.
This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.