Vol. 75:2 (2022) ► pp.223–256
The nature of graphs and graphemes in Middle Dutch writing and the problem of parsing
This article focuses on the practical problems that go along with the sequencing of older non-standardised spelling forms into consonant and vowel graphemes. The issue of segmental parsing is important in research on the development of Dutch spelling, mostly drawing from thirteenth and fourteenth century charter spellings. The research aims are introduced in part 1, and then the problems of defining graphs and then syllables are discussed in part 2. These problems stem from the theoretical graphemic level, but have a very direct impact on the practical level, as they hinder the automatic parsing of tokens in the corpus, as discussed in part 3. The goal of this article is to provide partial solutions in parsing of non-standardised language data for graphemic research.
Article outline
- 1.Introduction
- 1.1Development of Dutch Orthography
- 1.2Scope of the project
- 1.3Problems
- 2.Theoretical outset
- 2.1Terminology
- 2.1.1Grapheme vs. graph
- 2.1.2Graph base vs. graph extension
- 2.1.3Syllable opposition
- 2.2Segmentation of graphs/graphemes
- 2.1Terminology
- 3.Finding graphs in the Middle Dutch text corpus
- 3.1Letter-based graph parsing
- 3.2Syllable-based graph parsing
- 3.3Lemma-based graph parsing
- 4.Summary and outlook
- Acknowledgements
- Notes
-
References
https://doi.org/10.1075/nowele.00069.wul