Strategies in tracing linguistic variation in a corpus of Old Irish texts (CorPH)
This article introduces Corpus PalaeoHibernicum (CorPH), a corpus currently consisting of 78 texts in Early Irish
(c. 7th–10th cent.) created by the ERC-funded Chronologicon Hibernicum (ChronHib) project by
bringing together pre-existing lexical and syntactic databases and adding further crucial texts from the period. In addition to
being annotated for POS, morphological and syntactic information, another layer of annotation has been developed for CorPH –
‘Variation Tagging’, i.e. a tagset that numerically encodes synchronic language variation during the Early Irish period, thus
allowing for much improved research on the chronological variation among the material. Another new pillar of studying linguistic
variation is Bayesian Language Variation Analysis (BLaVA), in order to address the challenge that “not-so-big data” poses to
statistical corpus methods. Instead of reflecting feature frequencies, BLaVA models language variation as probabilities of
variation.
Article outline
- 1.Introduction
- 2.Characteristics of Old Irish
- 3.The corpus
- 4.
Corphusator
- 5.Variation tagging
- 6.Bayesian language variation analysis
- 7.Advantages and benefits of the methods
- 8.Challenges and desiderata
- Acknowledgements
-
References
References (45)
Atkinson, R.
(
1887)
The Passions and the Homilies from Leabhar Breac. Royal Irish Academy.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Barrett, S.
(
2017)
A Study of the Lexicon of the Poems of Blathmac Son of Cú Brettan. [Doctoral dissertation, Maynooth University]. MURAL – Maynooth University Research Archive Library.
[URL]
Bauer, B.
(
2015)
The online database of the Old Irish Priscian Glosses.
[URL]
Bauer, B.
in preparation).
Corpus Palaeohibernicum (CorPH): From an Early Irish lexical database to a text-based corpus using Python.
Bauer, B., Hofman, R., & Moran, P.
(
2017)
St Gall Priscian Glosses (
Version 2.0).
[URL]
Bronner, D.
(
2013)
Verzeichnis altirischer Quellen [
Directory of Old Irish Sources]. Philipps Universität Marburg.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Claris International Inc.
(
2006–15)
FileMaker Pro 8–14. [Computer Software].
[URL]
Crystal, D.
(
2008)
A Dictionary of Linguistics and Phonetics. (6th ed.). Blackwell.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Dublin Institute for Advanced Studies
(
2004–)
Irish Script on Screen.
[URL]
Evert, S.
(
2008)
Corpora and collocations. In
A. Lüdeling &
M. Kytö (Eds.),
Corpus Linguistics: An International Handbook (pp. 1212–1248). Mouton de Gruyter.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Färber, B.
(
2012–)
CELT: Corpus of Electronic Texts.
[URL]
Gries, S. Th., & Hilpert, M.
(
2010)
Modeling diachronic change in the third person singular: A multifactorial, verb- and author-specific exploratory approach.
English Language and Linguistics,
14
(3), 293–320.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Griffith, A., & Stifter, D.
(
2013)
Dictionary and Database of the Old Irish Glosses in the Milan MS Ambr. C301 inf.
[URL]
Griffith, A., Stifter, D., & Toner, G.
(
2018)
Early Irish lexicography – A research survey.
Kratylos,
63
1, 1–28.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Haspelmath, A.
(
2020)
The morph as a minimal linguistic form.
Morphology,
30
1, 117–134.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hellwig, O.
(
2019)
Dating Sanskrit texts using linguistic features and neural networks.
Indogermanische Forschungen,
124
1, 1–47.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hellwig, O.
(
2020)
Dating and stratifying a historical corpus with a Bayesian mixture model. In
R. Sprugnoli &
M. Passarotti (Eds.),
Proceedings of the LREC 2020 1st Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA 2020) (pp. 1–10). European Language Resources Association.
[URL]
Hemprich, G.
in preparation).
Catalogue of Medieval Irish Literature.
Hilpert, M., & Gries, S. Th.
(
2016)
Quantitative approaches to diachronic corpus linguistics. In
M. Kytö &
P. Pahta (Eds.),
The Cambridge Handbook of English Historical Linguistics (pp. 36–53). Cambridge University Press.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hundt, M.
(
2004)
Animacy, agentivity, and the spread of the progressive in Modern English.
English Language & Linguistics,
8
(1), 47–69.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kavanagh, S.
(
2001)
A Lexicon of the Old Irish Glosses in the Würzburg Manuscript of the Epistles of St. Paul (
D. S. Wodtko, Ed.). Österreichische Akademie der Wissenschaften.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kelly, P., & Fogarty, H.
(
2006–2011)
Thesaurus Linguae Hibernicae.
[URL]
Lash, E.
(
2014)
The Parsed Old and Middle Irish Corpus (POMIC) (
version 0.1).
[URL]
Lash, E., Qiu, F., & Stifter, D.
(
2020)
Introduction: Celtic studies and corpus linguistics. In
E. Lash,
F. Qiu, &
D. Stifter (Eds.),
Morphosyntactic Variation in Medieval Celtic Languages: Corpus-based Approaches (pp. 1–12). De Gruyter Mouton.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Lehmann, H. M., & Schneider, G.
(
2012)
Syntactic variation and lexical preference in the dative-shift alternation. In
J. Mukherjee &
M. Huber (Eds.),
Corpus Linguistics and Variation in English: Theory and Description (pp. 65–75). Rodopi.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
McCone, K.
(
1996)
Towards a Relative Chronology of Ancient and Medieval Celtic Sound Change. Maynooth.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
McCone, K.
(
1997)
The Early Irish Verb (Rev. 2nd ed. with index verborum.). An Sagart.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Ó Corráin, D.
(
2017)
Clavis Litterarum Hibernensium: Medieval Irish Books & Texts (c. 400 – c. 1600) (Vol. 1–31). Brepols.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Qiu, F., & Stifter, D.
(
2020)
Chronologicon Hibernicum: Frámaíocht dhóchúlaíoch chun dátú a dhéanamh ar fhorbairtí i dteanga na Sean-Ghaeilge [Chronologicon Hibernicum: A probabilistic framework for the dating of Old Irish language developments]. In
E. Ó Raghallaigh (Ed.),
Téamaí agus Tionscadail Taighde (pp. 39–59). An Sagart.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Qiu, F., Stifter, D., Bauer, B., Lash, E., & Tianbo, J.
(
2018)
Chronologicon Hibernicum: A probabilistic chronological framework for dating Early Irish language developments and literature. In
M. Ioannides et al. (Eds.),
Digital Heritage: Progress in Cultural Heritage: Documentation, Preservation, and Protection (pp. 731–740). Springer.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
R Core Team
(
2020)
R: A Language and Environment for Statistical Computing (
Version 4.0.0) [Computer Software]. R Foundation for Statistical Computing.
[URL]
Rögnvaldsson, E., & Helgadóttir, S.
(
2011)
Morphosyntactic tagging of Old Icelandic texts and its use in studying syntactic variation and change. In
C. Sporleder,
A. Bosch, &
K. Zervanou (Eds.),
Language Technology for Cultural Heritage (pp. 63–76). Springer.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Sagart, L., Jacques, G., Lai, Y., Ryder, R. J., Thouzeau, V., Greenhill, S. J., & List, J.
(
2019)
Dated language phylogenies shed light on the ancestry of Sino-Tibetan.
Proceedings of the National Academy of Sciences of the USA
116
(21), 10317–10322.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Schneider, G.
(
2008)
Hybrid Long-Distance Functional Dependency Parsing [Doctoral dissertation, University of Zurich].
[URL]
Schreier, D.
(
2005)
#CCV- > #CV-: Corpus-based evidence of historical change in English phonotactics.
International Journal of English Studies,
5
(1), 77–99.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Schumacher, S.
(
2004)
Die keltischen Primärverben: Ein vergleichendes, etymologisches und morpho-logisches Lexikon [
The Celtic Primary Verbs: A Comparative, Etymological and Morphological Dictionary]. Innsbruck.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Stifter, D.
(
2009)
Early Irish. In
M. Ball &
N. Müller (Eds.),
The Celtic Languages (2nd ed., pp. 55–116). Routledge.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Stifter, D., Barrett, S., Bauer, B., Ganly, E., Griffith, A., Ji, T., Lash, E., Nguyen, T. H., Osarobo, G., Qiu, F., & White, N.
(
2021–)
Corpus Palaeohibernicum.
[URL]
Stokes, W., & Strachan, J.
(Eds.) (
1901–1910)
Thesaurus Palaeohibernicus: A Collection of Old Irish Glosses, Scholia, Prose and Verse. Dublin Institute for Advanced Studies.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Su, Y.-S., & Yajima, M.
(
2020)
R2jags: Using R to Run ‘JAGS’ (
Version 0.6–1).
[URL]
Rama, T., & Wichmann, S.
(
2020)
A test of generalized Bayesian dating: A new linguistic dating method.
PLOS ONE
15
(8): e0236522.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Thurneysen, R.
(
1946)
A Grammar of Old Irish. The Dublin Institute for Advanced Studies.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Toner, G., & Han, X.
(
2019)
Language and Chronology: Text Dating by Machine Learning. Brill.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Uhlich, J.
(
2018)
Review article of:
P. Ó Riain (ed.),
The Poems of Blathmac Son of Cú Brettan: Reassessments. Irish Texts Society, 2015.
Cambrian Medieval Celtic Studies,
75
1, 53–77.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cited by (1)
Cited by 1 other publications
McEnery, Tony & Gavin Brookes
2024.
Corpus linguistics and the social sciences.
Corpus Linguistics and Linguistic Theory 0:0
![DOI logo](//benjamins.com/logos/doi-logo.svg)
This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.