Strategies in tracing linguistic variation in a corpus of Old Irish texts (CorPH)
This article introduces Corpus PalaeoHibernicum (CorPH), a corpus currently consisting of 78 texts in Early Irish
(c. 7th–10th cent.) created by the ERC-funded Chronologicon Hibernicum (ChronHib) project by
bringing together pre-existing lexical and syntactic databases and adding further crucial texts from the period. In addition to
being annotated for POS, morphological and syntactic information, another layer of annotation has been developed for CorPH –
‘Variation Tagging’, i.e. a tagset that numerically encodes synchronic language variation during the Early Irish period, thus
allowing for much improved research on the chronological variation among the material. Another new pillar of studying linguistic
variation is Bayesian Language Variation Analysis (BLaVA), in order to address the challenge that “not-so-big data” poses to
statistical corpus methods. Instead of reflecting feature frequencies, BLaVA models language variation as probabilities of
variation.
Article outline
- 1.Introduction
- 2.Characteristics of Old Irish
- 3.The corpus
- 4.
Corphusator
- 5.Variation tagging
- 6.Bayesian language variation analysis
- 7.Advantages and benefits of the methods
- 8.Challenges and desiderata
- Acknowledgements
-
References
References (45)
References
Atkinson, R. (1887). The Passions and the Homilies from Leabhar Breac. Royal Irish Academy.
Barrett, S. (2017). A Study of the Lexicon of the Poems of Blathmac Son of Cú Brettan. [Doctoral dissertation, Maynooth University]. MURAL – Maynooth University Research Archive Library. [URL]
Bauer, B. (2015). The online database of the Old Irish Priscian Glosses. [URL]
Bauer, B. (in preparation). Corpus Palaeohibernicum (CorPH): From an Early Irish lexical database to a text-based corpus using Python.
Bauer, B., Hofman, R., & Moran, P. (2017). St Gall Priscian Glosses (Version 2.0). [URL]
Bronner, D. (2013). Verzeichnis altirischer Quellen [Directory of Old Irish Sources]. Philipps Universität Marburg.
Claris International Inc. (2006–15). FileMaker Pro 8–14. [Computer Software]. [URL]
Crystal, D. (2008). A Dictionary of Linguistics and Phonetics. (6th ed.). Blackwell.
Dublin Institute for Advanced Studies. (2004–). Irish Script on Screen. [URL]
Evert, S. (2008). Corpora and collocations. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics: An International Handbook (pp. 1212–1248). Mouton de Gruyter.
Färber, B. (2012–). CELT: Corpus of Electronic Texts. [URL]
Gries, S. Th., & Hilpert, M. (2010). Modeling diachronic change in the third person singular: A multifactorial, verb- and author-specific exploratory approach. English Language and Linguistics,
14
(3), 293–320.
Griffith, A., & Stifter, D. (2013). Dictionary and Database of the Old Irish Glosses in the Milan MS Ambr. C301 inf. [URL]
Griffith, A., Stifter, D., & Toner, G. (2018). Early Irish lexicography – A research survey. Kratylos,
63
1, 1–28.
Haspelmath, A. (2020). The morph as a minimal linguistic form. Morphology,
30
1, 117–134.
Hellwig, O. (2019). Dating Sanskrit texts using linguistic features and neural networks. Indogermanische Forschungen,
124
1, 1–47.
Hellwig, O. (2020). Dating and stratifying a historical corpus with a Bayesian mixture model. In R. Sprugnoli & M. Passarotti (Eds.), Proceedings of the LREC 2020 1st Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA 2020) (pp. 1–10). European Language Resources Association. [URL]
Hemprich, G. (in preparation). Catalogue of Medieval Irish Literature.
Hilpert, M., & Gries, S. Th. (2016). Quantitative approaches to diachronic corpus linguistics. In M. Kytö & P. Pahta (Eds.), The Cambridge Handbook of English Historical Linguistics (pp. 36–53). Cambridge University Press.
Hundt, M. (2004). Animacy, agentivity, and the spread of the progressive in Modern English. English Language & Linguistics,
8
(1), 47–69.
Kavanagh, S. (2001). A Lexicon of the Old Irish Glosses in the Würzburg Manuscript of the Epistles of St. Paul (D. S. Wodtko, Ed.). Österreichische Akademie der Wissenschaften.
Kelly, P., & Fogarty, H. (2006–2011). Thesaurus Linguae Hibernicae. [URL]
Lash, E. (2014). The Parsed Old and Middle Irish Corpus (POMIC) (version 0.1). [URL]
Lash, E., Qiu, F., & Stifter, D. (2020). Introduction: Celtic studies and corpus linguistics. In E. Lash, F. Qiu, & D. Stifter (Eds.), Morphosyntactic Variation in Medieval Celtic Languages: Corpus-based Approaches (pp. 1–12). De Gruyter Mouton.
Lehmann, H. M., & Schneider, G. (2012). Syntactic variation and lexical preference in the dative-shift alternation. In J. Mukherjee & M. Huber (Eds.), Corpus Linguistics and Variation in English: Theory and Description (pp. 65–75). Rodopi.
McCone, K. (1996). Towards a Relative Chronology of Ancient and Medieval Celtic Sound Change. Maynooth.
McCone, K. (1997). The Early Irish Verb (Rev. 2nd ed. with index verborum.). An Sagart.
Ó Corráin, D. (2017). Clavis Litterarum Hibernensium: Medieval Irish Books & Texts (c. 400 – c. 1600) (Vol. 1–31). Brepols.
Qiu, F., & Stifter, D. (2020). Chronologicon Hibernicum: Frámaíocht dhóchúlaíoch chun dátú a dhéanamh ar fhorbairtí i dteanga na Sean-Ghaeilge [Chronologicon Hibernicum: A probabilistic framework for the dating of Old Irish language developments]. In E. Ó Raghallaigh (Ed.), Téamaí agus Tionscadail Taighde (pp. 39–59). An Sagart.
Qiu, F., Stifter, D., Bauer, B., Lash, E., & Tianbo, J. (2018). Chronologicon Hibernicum: A probabilistic chronological framework for dating Early Irish language developments and literature. In M. Ioannides et al. (Eds.), Digital Heritage: Progress in Cultural Heritage: Documentation, Preservation, and Protection (pp. 731–740). Springer.
R Core Team (2020). R: A Language and Environment for Statistical Computing (Version 4.0.0) [Computer Software]. R Foundation for Statistical Computing. [URL]
Rögnvaldsson, E., & Helgadóttir, S. (2011). Morphosyntactic tagging of Old Icelandic texts and its use in studying syntactic variation and change. In C. Sporleder, A. Bosch, & K. Zervanou (Eds.), Language Technology for Cultural Heritage (pp. 63–76). Springer.
Sagart, L., Jacques, G., Lai, Y., Ryder, R. J., Thouzeau, V., Greenhill, S. J., & List, J. (2019). Dated language phylogenies shed light on the ancestry of Sino-Tibetan. Proceedings of the National Academy of Sciences of the USA
116
(21), 10317–10322.
Schneider, G. (2008). Hybrid Long-Distance Functional Dependency Parsing [Doctoral dissertation, University of Zurich]. [URL]
Schreier, D. (2005). #CCV- > #CV-: Corpus-based evidence of historical change in English phonotactics. International Journal of English Studies,
5
(1), 77–99.
Schumacher, S. (2004). Die keltischen Primärverben: Ein vergleichendes, etymologisches und morpho-logisches Lexikon [The Celtic Primary Verbs: A Comparative, Etymological and Morphological Dictionary]. Innsbruck.
Stifter, D. (2009). Early Irish. In M. Ball & N. Müller (Eds.), The Celtic Languages (2nd ed., pp. 55–116). Routledge.
Stifter, D., Barrett, S., Bauer, B., Ganly, E., Griffith, A., Ji, T., Lash, E., Nguyen, T. H., Osarobo, G., Qiu, F., & White, N. (2021–). Corpus Palaeohibernicum. [URL]
Stokes, W., & Strachan, J. (Eds.). (1901–1910). Thesaurus Palaeohibernicus: A Collection of Old Irish Glosses, Scholia, Prose and Verse. Dublin Institute for Advanced Studies.
Su, Y.-S., & Yajima, M. (2020). R2jags: Using R to Run ‘JAGS’ (Version 0.6–1). [URL]
Rama, T., & Wichmann, S. (2020). A test of generalized Bayesian dating: A new linguistic dating method. PLOS ONE
15
(8): e0236522.
Thurneysen, R. (1946). A Grammar of Old Irish. The Dublin Institute for Advanced Studies.
Toner, G., & Han, X. (2019). Language and Chronology: Text Dating by Machine Learning. Brill.
Uhlich, J. (2018). Review article of: P. Ó Riain (ed.), The Poems of Blathmac Son of Cú Brettan: Reassessments. Irish Texts Society, 2015. Cambrian Medieval Celtic Studies,
75
1, 53–77.
Cited by (1)
Cited by one other publication
McEnery, Tony & Gavin Brookes
2024.
Corpus linguistics and the social sciences.
Corpus Linguistics and Linguistic Theory 20:3
► pp. 591 ff.
This list is based on CrossRef data as of 19 november 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.