Bryan Jurish | Berlin-Brandenburgische Akademie der Wissenschaften
This chapter presents the formal basis for diachronic collocation profiling as implemented in
the open-source software tool “DiaCollo” and sketches some potential applications to multi-genre diachronic corpora.
Explicitly developed for the efficient extraction, comparison, and interactive visualization of collocations from a
diachronic text corpus, DiaCollo is suitable for processing collocation pairs whose association strength depends on
extralinguistic features such as the date of occurrence or text genre. By tracking changes in a word’s typical
collocates over time, DiaCollo can help to provide a clearer picture of diachronic changes in the word’s usage,
especially those related to semantic shift or discourse environment. Use of the flexible DDC search engine back-end allows user queries to make explicit reference to genre and other
document-level metadata, thus allowing e.g. independent genre-local profiles or cross-genre comparisons. In addition
to traditional static tabular display formats, a web-service plugin also offers a number of intuitive interactive
online visualizations for diachronic profile data for immediate inspection.
2008A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to
examine discourses of refugees and asylum seekers in the UK press. Discourse & Society 19(3): 273–306.
Berry, Michael W., Dumais, Susan T. & O’Brien, Gavin
1995Using linear algebra for intelligent information retrieval. SIAM Review 37(4): 573–595. [URL].
Biber, Douglas, Johansson, Stig, Leech, Geoffrey, Conrad, Susan & Finegan, Edward
1999Longman Grammar of Spoken and Written English. London: Longman.
Blei, David M., Ng, Andrew Y. & Jordan, Michael I.
2003Latent Dirichlet allocation. Journal of machine Learning Research 3: 993–1022. [URL]
1990Word association norms, mutual information, and lexicography. Computational Linguistics 16(1):22–29.
Davies, Mark
2012Expanding horizons in historical linguistics with the 400-million word Corpus of Historical
American English. Corpora 7(2): 121–157. [URL].
Didakowski, Jörg & Geyken, Alexander
2003From DWDS corpora to a German word profile – methodological problems and solutions. In Network Strategies, Access Structures and Automatic Extraction of Lexicographical Information [OPAL X], Andrea Abel & Lothar Lemnitzer (eds). Mannheim: IDS. [URL]
Duff, Iain S., Grimes, Roger G. & Lewis, John G.
1989Sparse matrix test problems. ACM Transactions on Mathematical Software (TOMS), 15(1): 1–14.
Evert, Stefan
2005The Statistics of Word Cooccurrences: Word Pairs and Collocations. PhD dissertation, University of Stuttgart. [URL]
Evert, Stefan
2008Corpora and collocations. In Corpus Linguistics. An International Handbook, Anke Lüdeling & Merja Kytö (eds), 1212–1248. Berlin: Mouton de Gruyter.
Fielding, Roy T.
2000Architectural styles and the design of network-based software architectures. PhD dissertation, University of California, Irvine. [URL]
Firth, John Rupert
1957Papers in Linguistics 1934–1951. London: OUP.
Gabrielatos, Costas, McEnery, Tony, Diggle, Peter J. & Baker, Paul
1995Deictic shift theory and the poetics of involvement in narrative. In Deixis in Narrative: A Cognitive Science Perspective, Judith F. Duchan, Gail A. Bruder & Lynne E. Hewitt (eds), 19–59. Hillsdale NJ: Lawrence Erlbaum Associates.
Geyken, Alexander
2013Wege zu einem historischen Referenzkorpus des Deutschen: Das Projekt Deutsches
Textarchiv. In Perspektiven einer corpusbasierten historischen Linguistik und Philologie [Thesaurus Linguae Aegyptiae 4], Ingelore Hafemann (eds), 221–234. Berlin: Berlin-Brandenburgische Akademie der Wissenschaften. [URL]
2017Die Korpusplattform des “Digitalen Wörterbuchs der deutschen Sprache” (dwds). Zeitschrift für Germanistische Linguistik 45(2): 327–344.
Glazebrook, Karl & Economou, Frossie
1997PDL: The Perl data language. Dr. Dobb’s Journal, September 1997 <[URL]
Gries, Stephan Th. & Hilpert, Martin
2008The identification of stages in diachronic data: Variability-based neighbor
clustering. Corpora 3(1): 59–81. [URL].
Gulordava, Kristina & Baroni, Marco
2011A distributional similarity approach to the detection of semantic change in the Google Books Ngram
corpus. In
Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language
Semantics
, Edinburgh, UK, July 2011, 67–71. Stroudsburg PA: ACL. [URL]
Heaps, H. Stanley
1978Information Retrieval: Computational and Theoretical Aspects. Orlando FL: Academic Press.
Heidegger, Martin
1927Sein und Zeit. In Jahrbuch für Philosophie und phänomenologische Forschung, Edmund Husserl (ed.). Tübingen: Neomarius.
Herrmann, J. Bernike
2013Metaphor in Academic Discourse [LOT Dissertation Series]. Utrecht: Netherlands Graduate School of Linguistics.
Jurish, Bryan
2015DiaCollo: On the trail of diachronic collocations. In
CLARIN Annual Conference 2015
, Wrocław, Poland, October 14–16 2015, 28–31. [URL]
Jurish, Bryan, Thomas, Christian & Wiegand, Frank
2014Querying the deutsches Textarchiv. In Proceedings of the Workshop “Beyond Single-Shot Text Queries: Bridging the Gap(s) between Research
Communities” (MindTheGap 2014), Berlin, Germany, March 2014, Udo Kruschwitz, Frank Hopfgartner & Cathal Gurrin (eds), 25–30. [URL]
Jurish, Bryan, Geyken, Alexander & Werneke, Thomas
2016DiaCollo: Diachronen Kollokationen auf der Spur. In
Proceedings DHd 2016: Modellierung – Vernetzung – Visualisierung, University of
Leipzig
, March 2016, 172–175. [URL]
Kilgarriff, Adam & Tugwell, David
2002Sketching words. In Lexicography and Natural Language Processing: A Festschrift in Honour of B. T. S. Atkins, Marie-Hélène Corréard (ed.), 125–137. [URL]
2015DIACRAN: A framework for diachronic analysis. In Proceedings of Corpus Linguistics 2015, Federica Formato & Andrew Hardie (eds), 65–70. Lancaster: UCREL.
2014Temporal analysis of language through neural language models. In
Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social
Science
, June 2014, 61–65. Stroudsburg PA: ACL. [URL].
Manning, Christopher D. & Schütze, Hinrich
1999Foundations of Statistical Natural Language Processing. Cambridge MA: The MIT Press.
2013Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. [URL]
Moretti, Franco
2013Distant Reading. London: Verso Books.
Rychlý, Pavel
2008A lexicographer-friendly association score. In Proceedings of
Recent Advances in Slavonic Natural Language Processing
, RASLAN 2008, 6–9. [URL]
Sagi, Eyal, Kaufmann, Stefan & Clark, Brady
2009Semantic density analysis: Comparing word meaning across time and phonetic space. In
Proceedings of the EACL 2009 Workshop on Geometrical Models of Natural Language
Semantics
, March 2009. Stroudsburg PA: ACL. [URL]
Scharloth, Joachim, Eugster, David & Bubenhofer, Noah
2013Das Wuchern der Rhizome. Linguistische Diskursanalyse und Data-driven Turn. In Linguistische Diskursanalyse. Neue Perspektiven, Dietrich Busse & Wolfgang Teubert (eds), 345–380. Wiesbaden: VS Verlag.
1995Guidelines fur das Tagging deutscher Textcorpora mit STTS. Technical report, University of Stuttgart, Institut für maschinelle Sprachverarbeitung and University of Tübingen, Seminar für Sprachwissenschaft.
Sokirko, A.
2003A technical overview of DWDS/Dialing Concordance. Talk delivered at the meeting
Computational Linguistics and Intellectual Technologies
, Protvino, Russia. [URL]
Stalnaker, Robert C.
1974Pragmatic presuppositions. In Semantics and Philosophy, Milton K. Munitz & Peter K. Unger (eds), 197–213. New York NY: New York University Press.
Stalnaker, Robert C.
2002Common ground. Linguistics and Philosophy 25(5): 701–721.
Wang, Xuerui & McCallum, Andrew
2006Topics over time: A non-Markov continuous-time model of topical trends. In Proceedings of the
12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD
’06, New York
, 424–433. ACM.
Cited by
Cited by 1 other publications
Bizzoni, Yuri, Stefania Degaetano-Ortlieb, Peter Fankhauser & Elke Teich
2020. Linguistic Variation and Change in 250 Years of English Scientific Writing: A Data-Driven Approach. Frontiers in Artificial Intelligence 3
This list is based on CrossRef data as of 8 march 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.