Article published in:
International Journal of Corpus Linguistics
Vol. 22:1 (2017) ► pp. 107140


An automatic part-of-speech tagger for Middle Low German


Baron, A., & Rayson, P.
(2008, August). VARD2: A tool for dealing with spelling variation in historical corpora. Paper presented at Postgraduate Conference in Corpus Linguistics, Aston University, Birmingham, UK.
Barteld, F., Schröder, I., & Zinsmeister, H.
(2015) Unsupervised regularisation of historical texts for POS tagging. In F. Mambrini, M. Passarotti & C. Sporleder (Eds.), Proceedings of the Workshop on Corpus-Based Research in the Humanities (CRH) (pp. 3–12). Polish Academy of Sciences: Institute of Computer Science.Google Scholar
Bennett, P., Durrell, M., Scheible, S., & Whitt, R. J.
(2010) Annotating a historical corpus of German: A case study. In Proceedings of the LREC 2010 workshop on Language Resources and Language Technology Standards (pp. 64–68). European Language Resources Association.Google Scholar
Biber, D., Conrad, S., & Reppen, R.
(1998) Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press. CrossrefGoogle Scholar
Biebersteadt, A.
(2015) Variablenlinguistische Beobachtungen zu den mittelniederdeutschen Schreibsprachen des südlichen Ostseeraumes: Wismar und Stralsund als Beispiele. In H. U. Schmid & A. Ziegler (Eds.), 2015: Jahrbuch für Germanistische Sprachgeschichte. Bd. 6: Deutsch im Norden (pp. 88–115). Berlin/New York: De Gruyter.Google Scholar
Bollmann, M., Petran, F., Dipper, S., & Krasselt, J.
(2014) CorA: A web-based annotation tool for historical and other non-standard language data. In Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH) (pp. 86–90). CrossrefGoogle Scholar
Braunmüller, K.
(1996) Forms of language contact in the area of the Hanseatic League: Dialect contact phenomena and semicommunication. Nordic Journal of Linguistics, 19(2), 141–154. CrossrefGoogle Scholar
(2002) Language contact during the Old Nordic period I: With the British Isles, Frisia and the Hanseatic League. In O. Bandle, K. Braunmüller, E. H. Jahr, A. Karker, H.-P. Naumann & U. Teleman (Eds.), The Nordic Languages: An International Handbook of the History of the Nordic Germanic Languages, Volume 1 (pp. 1028–1039). Berlin/New York: De Gruyter.Google Scholar
Breitbarth, A., Walkden, G., & Watts, S.
(2011 April). A Corpus for Middle Low German. Paper presented at New Methods in Historical Corpora, Manchester, UK.
(2012 April). Building a corpus for Middle Low German: Notes and queries. Paper presented at the Forum for Germanic Language Studies (FGLS10), Sheffield, UK.
Britto, H., Finger, M., & Galves, C.
(2002) Computational and linguistic aspects of the construction of The Tycho Brahe Parsed Corpus of Historical Portuguese. Romanistische Korpuslinguistik, Korpora und gesprochene Sprache, Romance Corpus Linguistics, Corpora and Spoken Language, ScriptOralia, 126.Google Scholar
Daelemans, W., Van den Bosch, A., & Zavrel, J.
(1999) Forgetting examples is harmful in language learning. Machine Learning, 34(1–3), 11–43. CrossrefGoogle Scholar
Daelemans, W., & Van den Bosch, A.
(2005) Memory-based Language Processing. Cambridge: Cambridge University Press. CrossrefGoogle Scholar
De Clercq, O.
(2015) Tipping the scales: exploring the added value of deep semantic processing on readability prediction and sentiment analysis (Unpublished doctoral dissertation). Ghent University, Ghent, Belgium.Google Scholar
Desmet, B., Hoste, V., Verstraeten, D., & Verhasselt, J.
(2013) Gallop Documentation, (LT3 Technical Report - LT3 13.03).Google Scholar
Desmet, B.
(2014) Finding the online cry for help: Automatic text classification for suicide prevention (Unpublished doctoral dissertation). Ghent University, Ghent, Belgium.Google Scholar
Diel, M., Fisseni, B., Lenders, W., & Schmitz, H.-C.
(2002) XML-Kodierung des Bonner Frühneuhochdeutschkorpus. Bonn: IKP-Arbeitsbericht NF 02.Google Scholar
Dipper, S.
(2015) Annotierte Korpora für die Historische Syntaxforschung: Anwendungsbeispiele anhand des Referenzkorpus Mittelhochdeutsch. Zeitschrift für Germanistische Linguistik, 43(3), 516–563. CrossrefGoogle Scholar
Dipper, S., Donhauser, K., Klein, T., Linde, S., Müller, S., & Wegera, K. P.
(2013) HiTS: ein Tagset für historische Sprachstufen des Deutschen. Journal for Language Technology and Computational Linguistics, 28(1), 85–137.Google Scholar
Fisseni, B., Schmitz, H.-C., & Schröder, B.
(2007) FnhdC/HTML und FnhdC/S. Sprache und Datenverarbeitung, 1–2/2007, 67–69.Google Scholar
Geyken, A., Haaf, S., Jurish, B., Schulz, M., Steinmann, J., Thomas, C., & Wiegand, F.
(2011) Das Deutsche Textarchiv: Vom historischen Korpus zum aktiven Archiv. In Digitale Wissenschaft. Stand und Entwicklung digital vernetzter Forschung in Deutschland, 20/21, September 2010, Beiträge der Tagung, 2., ergänzte Fassung (pp. 157–161).Google Scholar
Kroch, A., Taylor, A., & Ringe, D.
(2000) The Middle English verb-second constraint: A case study in language contact and language change. In S. Herring, P. van Reenen & L. Schøsler (Eds.), Textual Parameters in Older Languages (pp. 353–392). Amsterdam/Philadelphia: Benjamins.Google Scholar
Lafferty, J., McCallum, A., & Pereira, F.
(2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (pp. 282–289). San Francisco, CA: Morgan Kaufmann.Google Scholar
Linde, S., & Mittmann, R.
(2013) Old German reference corpus: Digitizing the knowledge of the 19th century. In P. Bennett, M. Durrell, S. Scheible, R. J. Whitt (Eds.), New Methods in Historical Corpora (pp. 235–246). Tübingen: Narr Verlag.Google Scholar
Marcus, M. P., Santorini B., & Marcinkiewicz, M. A.
(1993) Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.Google Scholar
Martineau, F.
(2005) Modéliser le changement: Les voies du français/Modelling change: The paths of French. Ottawa: University of Ottawa. Retrieved from www​.voies​.uottawa​.ca​/corpus​_pg​_en​.html (last accessed March 2017).
Moon, T., & Baldridge, J.
(2007) Part-of-speech tagging for Middle English through alignment and projection of parallel diachronic texts. In Proceedings of EMNLP/CONLL-2007 (pp. 390–399).Google Scholar
Peters, R.
(1973) Mittelniederdeutsche Sprache. In J. Goossens (Ed.), Niederdeutsch – Sprache und Literatur. Bd. 1: Sprache (pp. 66–115). Neumünster: Wachholtz.Google Scholar
(2003) Variation und Ausgleich in den mittelniederdeutschen Schreibsprachen. In M. Goyens & W. Verbeke (Eds.), The Dawn of the Written Vernacular in Western Europe (pp. 427–440). Leuven: Leuven University Press.Google Scholar
Peters, R., & Fischer, C.
(2007) Der ‘Atlas spätmittelalterlicher Schreibsprachen des niederdeutschen Altlandes und angrenzender Gebiete’. In L. Czajkowski, C. Hoffmann, H. U. Schmid (Eds.), Ostmitteldeutsche Schreibsprachen im Spätmittelalter (pp. 23–33). Berlin: De Gruyter. CrossrefGoogle Scholar
Peters, R., & Nagel, N.
(2014) Das digitale ‘Referenzkorpus Mittelniederdeutsch/Niederrheinisch (ReN)’. Jahrbuch für Germanistische Sprachgeschichte, 5(1), 165–175. Berlin/Boston: de Gruyter.Google Scholar
Pettersson, E., Megyesi, B., & Nivre, J.
(2013) Normalisation of historical text using context-sensitive weighted Levenhstein distance and compound splitting. In Proceedings of the 19th Nordic Conference on Computational Linguistics (NoDaLiDa 2013) (pp. 163–179). Linköping: Linköping Electronic Conference Proceedings 85.Google Scholar
(2014) A multilingual evaluation of three spelling normalization methods for historical text. In Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences and Humanities (LaTeCH 2014) (pp. 32–41). Gothenburg: Association for Computational Linguistics. CrossrefGoogle Scholar
Rayson, P., Archer, D., Baron, A., Culpeper, J., & Smith, N.
(2007) Tagging the bard: Evaluating the accuracy of a modern POS tagger on early modern English corpora. In Proceedings of Corpus Linguistics 2007. Birmingham: University of Birmingham, UK.Google Scholar
Rögnvaldsson, E., & Helgadóttir, S.
(2011) Morphosyntactic tagging of Old Icelandic texts and its use in studying syntactic variation and change. In C. Sporleder, A. van den Bosch, K. Zervanou (Eds.), Language Technology for Cultural Heritage: Selected Papers from the LaTeCH Workshop Series (pp. 63–76). Berlin: Springer. CrossrefGoogle Scholar
Sanders, W.
(1982) Sprachgeschichtliche Grundzüge des Niederdeutschen. Vandenhoeck + Ruprecht Gm.Google Scholar
Scheible, S., Whitt, R. J., Durrell, M., & Bennett, P.
(2011a) A gold standard corpus of Early Modern German. In Proceedings of the 5th Linguistic Annotation Workshop (LAW V 2011) (pp. 124–128). Association for Computational Linguistics.Google Scholar
(2011b) Evaluating an ‘off-the-shelf’ POS-tagger on early modern German text. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH 2011), pp. 19–23. Portland, OR: Association for Computational Linguistics.Google Scholar
Schiller, A., Teufel, S., & Thielen, C.
(1995) Guidelines für das Tagging deutscher Textkorpora mit STTS. Technical report, Universities of Stuttgart and Tübingen, 66. Retrieved from www​.sfs​.uni​-tuebingen​.de​/resources​/stts​-1999​.pdf (last accessed March 2017).
Schmid, H., & Laws, F.
(2008) Estimation of conditional probabilities with decision trees and an application to fine-grained POS tagging. Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008) - Volume 1 (pp. 777–784). Manchester: Association for Computational Linguistics. CrossrefGoogle Scholar
Schneider, G., Lehman, H. M., & Schneider, P.
(2015) Parsing early and late modern English corpora. Literary and Linguistic Computing, 30(3), 423–439.Google Scholar
Schröder, I.
(2014) Neue Perspektiven für die mittelniederdeutsche Grammatikographie. Jahrbuch für germanistische Sprachgeschichte, 5(1), 150–164. CrossrefGoogle Scholar
Schulz, S., De Pauw, G. De Clercq, O., Desmet, B., Hoste, V., Daelemans, W., & Macken, L.
(2016) Multimodular Text Normalization of Dutch User-Generated Content. ACM Transactions on Intelligent Systems and Technology (TIST), 7(4), 1–22. CrossrefGoogle Scholar
Silfverberg, M., Ruokolainen, B., Lindén, K., & Kurimo, M.
(2014) Part-of-speech tagging using conditional random fields: Exploiting sub-label dependencies for improved accuracy. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (pp. 259–264). Baltimore, MD.Google Scholar
Sukhareva, M., & Chiarcos, C.
(2016) Combining ontologies and neural networks for analyzing historical language varieties: A case study in Middle Low German. In N. Calzolari, K. Choukri, T. Declerck, M. Grobelnik, B. Maegaard, J. Mariani, A. Moreno, J. Odijk & Stelios Piperidis (Eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Paris: European Language Resources Association (ELRA). Retrieved from www​.lrec​-conf​.org​/proceedings​/lrec2016​/summaries​/822​.html (last accessed March 2017).
Tophinke, D.
(2009) Vom Vorlesetext zum Lesetext: Zur Syntax mittelniederdeutscher Rechtsverordnungen im Spätmittelalter. In A. Linke, & H. Feilke (Eds.), Oberfläche und Performanz. Untersuchungen zur Sprache als dynamische Gestalt (pp. 161–186). Tübingen: Niemeyer. CrossrefGoogle Scholar
(2012) Syntaktischer Ausbau im Mittelniederdeutschen. Theoretisch-methodische Überlegungen und kursorische Analysen. Niederdeutsches Wort, 52, 19–46.Google Scholar
Tophinke, D., & Wallmeier, N.
(2011) Textverdichtungsprozesse im Spämittelalter: Syntaktischer Wandel in mittelniederdeutschen Rechtstexten des 13.–16. Jahrhunderts. In S. Elspaß & M. Negele (Eds.) Sprachvariation und Sprachwandel in der Stadt der Frühen Neuzeit (pp. 97–116). Heidelberg: Winter.Google Scholar
Van de Kauter, M., Coorman, G., Lefever, E., Desmet, B., Macken, L., & Hoste, V.
(2013) LeTs Preprocess: The multilingual LT3 linguistic preprocessing toolkit. Computational Linguistics in the Netherlands Journal, 3, 103–120.Google Scholar
Walkden, G.
(2016) The HeliPaD: A parsed corpus of Old Saxon. International Journal of Corpus Linguistics, 21(4), 559–571. CrossrefGoogle Scholar
Wallenberg, J. C., Ingason, A. K., Sigurðsson, E. F., & Rögnvaldsson, E.
(2011) Icelandic parsed historical corpus (IcePaHC) (Version 0.9). Available at www​.linguist​.is​/icelandic​_treebank​/Icelandic​_Parsed​_Historical​_Corpus​_%28IcePaHC%29 (last accessed March 2017).
Yang, Y., & Eisenstein, J.
(2016) Part-of-speech tagging for historical English. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), San Diego. CrossrefGoogle Scholar
Cited by

Cited by 2 other publications

Barteld, Fabian, Chris Biemann & Heike Zinsmeister
2019. Token-based spelling variant detection in Middle Low German texts. Language Resources and Evaluation 53:4  pp. 677 ff. Crossref logo
Farasyn, Melissa, George Walkden, Sheila Watts & Anne Breitbarth
2018.  In Diachronic Corpora, Genre, and Language Change [Studies in Corpus Linguistics, 85],  pp. 281 ff. Crossref logo

This list is based on CrossRef data as of 28 august 2021. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.