References

Part of

Frequency, Dispersion, Association, and Keyness: Revising and tupleizing corpus-linguistic measures
Stefan Th. Gries
[Studies in Corpus Linguistics 115] 2024
► pp. 308–318

References

Ackermann, Kirsten & Yu-Hua Chen. 2013. Developing the Academic Collocation List: A corpus-driven and expert-judged approach. Journal of English for Academic Purposes 12(4). 235–247.

Adelman, James S., Gordon D. A. Brown, & José F. Quesada. 2006. Contextual Diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science 19(9). 814–823.

Adèr, Herman J. 2008. Modelling. In Herman J. Adèr & Gideon J. Mellenbergh (eds.), Advising on research methods: A consultant’s companion, 271–304. Huizen: Johannes van Kessel Publishing.

Ambridge, Ben, Anna L. Theakston, Elena V. M. Lieven, & Michael Tomasello. 2006. The distributed learning effect for children’s acquisition of an abstract syntactic construction. Cognitive Development 21(2). 174–193.

Archer, Dawn (ed.). 2009. What’s in a word-list? Investigating word frequency and keyword extraction. London: Routledge.

Arppe, Antti. 2008. Univariate, bivariate and multivariate methods in corpus-based lexicography – A study of synonymy. Ph.D. dissertation, University of Alberta.

Aslin, Richard N. & Elissa L. Newport. 2012. Statistical learning: From acquiring specific items to forming general rules. Current Directions in Psychological Science 21(3). 170–176.

Baayen, R. Harald. 2010. Demythologizing the word frequency effect: A discriminative learning perspective. The Mental Lexicon 5(3). 436–461.

. 2011. Corpus linguistics and naive discriminative learning. Brazilian Journal of Applied Linguistics 11(2). 295–328.

Baayen, R. Harald, Petar Milin, Dusica Filipović-Đurđević, D., Peter Hendrix, & Marco Marelli. 2011. An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review 118(3). 438–481.

Babych, Bogdan & Anthony Hartley. 2011. Meta-evaluation of comparability metric using parallel corpora. International Journal of Computational Linguistics and Applications 2(1–2). 209–222.

Baguley, Thom. 2012. Serious stats: A guide to advanced statistics for the behavioral sciences. Houndmills: Palgrave Macmillan.

Baker, Paul. 2004. Querying keywords: Questions in difference, frequency, and sense in keyword analysis. Journal of English Linguistics 32(4). 346–359.

Baker, Paul & Tony McEnery. 2005. A corpus-based approach to discourses of refugees and asylum seekers in UN and newspaper texts. Journal of Language and Politics 4(2). 197–226.

Baron, Alistair, Paul Rayson, & Dawn Archer. 2009. Word frequency and keyword statistics in historical corpus linguistics. Anglistik: International Journal of English Studies 20(1). 41–67.

Bavaud, François. 2009. Information theory, relative entropy and statistics. In G. Sommaruga (ed.), Formal theories of information: Lecture notes in computer science, 54–78. Berlin: Springer.

Belov, Dmitry I. & Ronald D. Armstrong. 2011. Distributions of the Kullback-Leibler divergence with applications. British Journal of Mathematical and Statistical Psychology 64(2). 291–309.

Berger, Cynthia, Scott Crossley, & Stephen Skalicky. 2019. Using lexical features to investigate second language lexical decision performance. Studies in Second Language Acquisition 41(5). 911–935.

Berry-Rogghe, Godelieve L. M. 1974. Automatic identification of phrasal verbs. In John L. Mitchell (ed.), Computers in the humanities, 16–26. Edinburgh: Edinburgh University Press.

Bestgen, Yves & Sylviane Granger. 2014. Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing 26. 28–41.

Biber, Douglas, Randi Reppen, Erin Schnur, & Romy Ghanem. 2016. On the (non)utility of Juilland’s D to measure lexical dispersion in large corpora. International Journal of Corpus Linguistics 21(4). 439–464.

Bondi, Marina & Mike Scott (eds.). 2010. Keyness in texts. Amsterdam: John Benjamins.

Bortz, Jürgen, Gustav A. Lienert, & Klaus Boehnke. 2008. Verteilungsfreie Methoden in der Biostatistik. 3rd corr. ed. Heidelberg: Springer Medizin Verlag.

Bouma, Gerlof. 2009. Normalized (Pointwise) Mutual Information in collocation extraction. Proceedings of the Biennial GSCL Conference 30. 31–40.

Bresnan, Joan, Anna Cueni, Tatiana Nikitina, & R. Harald Baayen. 2007. Predicting the dative alternation. In Gerlof Bouma, Irene Kraemer, & Joost Zwarts (eds.), Cognitive foundations of interpretation, 69–94. Amsterdam: Royal Netherlands Academy of Arts and Sciences.

Brezina, Vaclav, & Miriam Meyerhoff. 2014. Significant or random? A critical review of sociolinguistic generalisations based on large corpora. International Journal of Corpus Linguistics 19(1). 1–28.

Brysbaert, Marc & Boris New. 2009. Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods 41(4). 977–990.

Burch, Brent, Jesse Egbert, & Douglas Biber. 2017. Measuring and interpreting lexical dispersion in corpus linguistics. Journal of Research Design and Statistics in Linguistics and Communication Science 3(2). 189–216.

Burnham, Kenneth P. & David R. Anderson. 2001. Kullback-Leibler information as a basis for strong inference in ecological studies. Wildlife Research 28(2). 111–119.

Bybee, Joan & Sandra A. Thompson. 1997. Three frequency effects in syntax. Berkeley Linguistics Society 23. 65–85.

Carroll, John B. 1970. An alternative to Juilland’s usage coefficient for lexical frequencies and a proposal for a standard frequency index. Computer Studies in the Humanities and Verbal Behaviour 3(2). 61–65.

Charles, Walter G., & George A. Miller. 1989. Contexts of antonymous adjectives. Applied Psycholinguistics 10(3). 357–375.

Chen, Stanley F. & Joshua Goodman. 1999. An empirical study of smoothing techniques for language modeling. Computer Speech and Language 13(4). 359–394.

Church, Kenneth W. 2000. Empirical estimates of adaptation: The chance of two Noriegas is closer to ^p/₂ than p². In Proceedings of the COLING 2000 (The 18th international conference on computational linguistics). np.

Church, Kenneth W. William Gale, Patrick Hanks, & Douglas Hindle. 1991. Using statistics in lexical analysis. In Uri Zernik (ed.), Lexical acquisition: Exploiting on-line resources to build a lexicon, 115–164. Hillsdale, NJ: Lawrence Erlbaum Associates.

Church, Kenneth W. & Patrick Hanks. 1993. Word association norms, mutual information, and lexicography. Computational Linguistics 16(1). 22–29.

Collins, Peter. 2021. Cultural keywords in World Englishes: A GloWbE-based study. ICAME Journal 45. 5–35.

Cover, Thomas H. & Joy A. Thomas. 2006. Elements of information theory. 2nd ed. Hoboken, NJ: John Wiley.

Culpeper, Jonathan. 2002. Computers, language and characterisation: An analysis of six characters in Romeo and Juliet. In Ulla Melander-Marttala, Carin Östman, & Merja Kyto (eds.), Conversation in life and in literature, 11–30. Uppsala: Association Suédoise de Linguistique Appliquée.

. 2009. Words, parts-of-speech and semantic categories in the character-talk of Shakespeare’s Romeo and Juliet. International Journal of Corpus Linguistics 14(1). 29–59.

Cvrček, Václav & Masako Fidler. 2019. More than keywords: Discourse prominence analysis of the Russian web portal Sputnik Czech Republic. In M. Berrocal & A. Salamurović (eds.), Political discourse in Central, Eastern and Balkan Europe, 93–117. Amsterdam John Benjamins.

. 2022. No keyword is an island: In search of covert associations. Corpora 17(2). 259–290.

Damerau, Frederick J. 1990. Evaluating computer-generated domain-oriented vocabularies. Information Processing and Management 26(6). 791–801.

1993. Generating and evaluating domain-oriented multi-word terms from texts. Information Processing and Management 29(4). 433–447.

Daudaravičius, Vidas & Ruta Marcinkevičienė. 2004. Gravity counts for the boundaries of collocations. International Journal of Corpus Linguistics 9(2). 321–348.

Degaetano-Ortlieb, Stefania & Elke Teich. 2016. Information-based modeling of diachronic linguistic change: From typicality to productivity. In Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), 165–173. Berlin.

. 2022. Toward an optimal code for communication: The case of scientific English. Corpus Linguistics and Linguistic Theory 18(1). 175–207.

Davies, Mark & Dee Gardner. 2010. A frequency dictionary of contemporary American English: Word sketches, collocates and thematic lists. London: Routledge.

Do, Youngah & Ryan Ka Yau Lai. 2019. Large-sample confidence intervals of information-theoretic measures in linguistics. Journal of Research Design and Statistics in Linguistics and Communication Science 6(1). 19–54

Dunning, Ted. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1). 61–74.

Durlak, Joseph A. 2009. How to select, calculate, and interpret effect sizes. Journal of Pediatric Psychology 34(9). 917–928.

Durrant, Phil & Norbert Schmitt. 2009. To what extent do native and non-native writers make use of collocations? International Review of Applied Linguistics 47. 157–177.

Edmundson, Harold P. & W. Wyllys. 1961. Automatic abstracting and indexing – Survey and recommendations. Communications of the ACM 4. 226–234.

Egbert, Jesse & Douglas Biber. 2019. Incorporating text dispersion into keyword analyses. Corpora 14(1). 77–104.

Ellis, Nick C. 2006. Language acquisition as rational contingency learning. Applied Linguistics 27(1). 1–24.

Ellis, Nick C., Ute Römer, & Matthew Brook O’Donnell. 2016. Usage-based approaches to language acquisition and processing. New York, NY: Wiley-Blackwell.

Ellis, Nick C. & Rita Simpson-Vlach. 2005. An academic formulas list (AFL): Extraction, validation, prioritization. Paper presented at Phraseology 2005, Université Catholique Louvain-la-Neuve.

Ellis, Nick. C., Rita Simpson-Vlach, & Carson Maynard. 2007. The processing of formulas in native and L2 speakers: psycholinguistic and corpus determinants. Paper presented at the Symposium on Formulaic Language, University of Wisconsin-Milwaukee.

Eskridge, William N., Brian G. Slocum, & Stefan Th. Gries. 2021. The meaning of sex: Dynamic words, novel applications, and original public meaning. Michigan Law Review 119(7). 1503–1580.

Evert, Stefan. 2009. Corpora and collocations. In Anke Lüdeling & Merja Kytö (eds.), Corpus linguistics: An international handbook, Vol. 2, 1212–1248. Berlin: Mouton de Gruyter.

Evert, Stefan & Brigitte Krenn. 2001. Methods for the qualitative evaluation of lexical association measures. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, 188–195. Toulouse.

Fankhauser, Peter, Jörg Knappen, & Elke Teich. 2014. Exploring and visualizing variation in language resources. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14), 4125–4128.

Fidler, Masako & Václav Cvrček. 2015. A data-driven analysis of reader viewpoints: reconstructing the historical reader using keyword analysis. Journal of Slavic Linguistics 23(2). 197–239.

Firth, John R. 1957. Studies in linguistic analysis. Oxford: Basil Blackwell.

Francis, W. Nelson & Henry Kučera. 1982. Frequency analysis of English usage: Lexicon and grammar. Boston, MA: Houghton Mifflin.

Forster, Kenneth I. & Susan M. Chambers. 1973. Lexical access and naming time. Journal of Verbal Learning and Verbal Behavior 12(6). 627–635.

Fidelholtz, James L. 1975. Word frequency and vowel reduction in English. Chicago Linguistic Society 11. 200–213.

Gabrielatos, Costas. 2018. Keyness analysis: Nature, metrics and techniques. In Charlotte Taylor & Anna Marchi (eds.), Corpus approaches to discourse: A critical review, 225–258. London: Routledge.

Gale, William A. & Geoffrey Sampson. 1995. Good-Turing frequency estimation without tears. Journal of Quantitative Linguistics 2(3). 217–237.

Gardner, Dee & Mark Davies. 2014. A new Academic Vocabulary List. Applied Linguistics 35(3). 305–327.

Garson, G. David. 1975. Handbook of political science methods. 2nd ed. Boston, MA: Holbrook Press.

Glenberg, Arthur M. 1976. Monotonic and nonmonotonic lag effects in paired-associate and recognition memory paradigms. Journal of Verbal Learning and Verbal Behavior 15(1). 1–15.

1979. Component-levels theory of the effects of spacing of repetitions on recall and recognition. Memory and Cognition 7(2). 95–112.

Goldberg, Adele E. 1995. Constructions: A Construction Grammar approach to argument structure. Chicago, IL: The University of Chicago Press.

Goldberg, Adele E., Devin M. Casenhiser, & Nitya Sethuraman. 2004. Learning argument structure generalizations. Cognitive Linguistics 15(3). 289–316.

Gómez, Rebecca L. 2002. Variability and detection of invariant structure. Psychological Science 13(5). 431–436.

Groom, Nicholas. 2009. Effects of second language immersion on second language collocational development. In Andy Barfield & Henrik Gyllstad (eds.), Researching collocations in another language, 21–33. Houndmills: Palgrave Macmillan.

Gries, Stefan Th. 2003. Towards a corpus-based identification of prototypical instances of constructions. Annual Review of Cognitive Linguistics 1. 1–27.

Null-hypothesis significance testing of word frequencies: A follow-up on Kilgarriff. Corpus Linguistics and Linguistic Theory 1(2). 277–294.

2006. Exploring variability within and between corpora: Some methodological considerations. Corpora 1(2). 109–151.

2008. Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics 13(4). 403–437.

2010. Dispersions and adjusted frequencies in corpora: Further explorations. In Stefan Th. Gries, Stefanie Wulff, & Mark Davies (eds.), Corpus linguistic applications: Current studies, new directions, 197–212. Amsterdam: Rodopi.

2013. 50-something years of work on collocations: What is or should be next … International Journal of Corpus Linguistics 18(1). 137–165.

2016. Quantitative corpus linguistics with R. 2nd rev. & ext. ed. New York & London: Routledge, pp. 274.

2018. The discriminatory power of lexical context for alternations: An information-theoretic exploration. Journal of Research Design and Statistics in Linguistics and Communication Science 5(1–2). 78–106.

2019a. 15 years of collostructions: Some long overdue additions/corrections (to/of actually all sorts of corpus-linguistics measures). International Journal of Corpus Linguistics 24(3). 385–412.

2019b. Ten lectures on corpus-linguistic approaches: Applications for usage-based and psycholinguistic research. Leiden: Brill.

2020. Analyzing dispersion. In Magali Paquot & Stefan Th. Gries (eds.), A practical handbook of corpus linguistics, 99–118. Berlin: Springer.

2021a. Statistics for linguistics with R. 3rd rev. & ext. ed. Berlin: De Gruyter.

2021b. A new approach to (key) keywords analysis: Using frequency, and now also dispersion. Research in Corpus Linguistics 9(2). 1–33.

2022a. What do (some of) our association measures measure (most)? Association? Journal of Second Language Studies 5(1). 1–33.

2022b. What do (most of) our dispersion measures measure (most)? Dispersion? Journal of Second Language Studies 5(2). 171–205.

2022c. Towards more careful corpus statistics: Uncertainty estimates for frequencies, dispersions, association measures, and more. Research Methods in Applied Linguistics 1(1).

2022d. Multi-word units (and tokenization more generally): A multi-dimensional and largely information-theoretic approach. Lexis 19.

2024. Corrections to Nelson (2023): DP_norm and D_KLnorm are not wrong on pi at all. Journal of Quantitative Linguistics.

To appear. Cultural keywords in varieties research: Some suggestions to extend existing work. World Englishes.

Gries, Stefan Th., Beate Hampe, & Doris Schönefeld. 2005. Converging evidence: Bringing together experimental and corpus data on the association of verbs and constructions. Cognitive Linguistics 16(4). 635–676.

Gries, Stefan Th. & Joybrato Mukherjee. 2010. Lexical gravity across varieties of English: An ICE-based study of n-grams in Asian Englishes. International Journal of Corpus Linguistics 15(4). 520–548.

Groom, Nicholas. 2009. Effects of second language immersion on second language collocational development. In Andy Barfield & Henrik Gyllstad (eds.), Researching collocations in another language, 21–33. Houndmills: Palgrave Macmillan.

Hackstein, Olav & Ryan Sandell. 2023. The rise of colligations: English can’t stand and German nicht ausstehen können. International Journal of Corpus Linguistics 28(1). 60–90.

Harris, Zellig S. 1970. Papers in structural and transformational linguistics. Dordrecht: Reidel.

Hilpert, Martin & Stefan Th. Gries. 2009. Assessing frequency changes in multi-stage diachronic corpora: Applications for historical corpus linguistics and the study of language acquisition. Literary and Linguistic Computing 34(4). 385–401.

Hoffman, Elaine B., Pranab K. Sen, Clarice R. Weinberg. 2001. Within-cluster resampling. Biometrika 88(4). 1121–1134.

Howes, Davis H. & Richard L. Solomon. 1951. Visual duration threshold as a function of word probability. Journal of Experimental Psychology 41(6). 401–410.

Hunston, Susan. 2002. Corpora in applied linguistics. Cambridge: Cambridge University Press.

James, Gareth, Daniela Witten, Trevor Hastie, & Robert Tibshirani. 2021. An introduction to statistical learning with applications in R. 2nd ed. Berlin: Springer.

Juilland, Alphonse G., Dorothy R. Brodin, & Catherine Davidovitch. 1970. Frequency dictionary of French words. The Hague: Mouton de Gruyter.

Juilland, Alphonse & E. Chang-Rodriguez. 1964. Frequency dictionary of Spanish words. The Hague: Mouton de Gruyter.

Justeson, John S. & Slava M. Katz. 1991. Co-occurrences of antonymous adjectives and their contexts. Computational Linguistics 17(1). 1–20.

Karlsson, Fred. 1985. Paradigms and word forms. Studia Gramatyczne 7. 135–154.

. 1986. Frequency considerations in morphology. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung 39(1). 19–28.

Koplenig, Alexander. 2017. A data-driven method to identify (correlated) changes in chronological corpora. Journal of Quantitative Linguistics 24(4). 289–318.

Kullback, Solomon & Richard A. Leibler. 1951. On information and sufficiency. Annals of Mathematical Statistics 22(1). 79–86.

Kuperman, Victor, Hans Stadthagen-Gonzalez, & Marc Brysbaert. 2012. Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods 44. 978–990.

Kyle, Kristopher & Scott A. Crossley. 2015. Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly 49(4). 757–786.

Lachman, Roy. 1973. Uncertainty effects on time to access the internal lexicon. Journal of Experimental Psychology 99(2). 199–208.

Langacker, Ronald W. 1987. Foundations of Cognitive Grammar I: Theoretical prerequisites. Stanford, CA: Stanford University Press.

Langenhorst, Jan, Yannick Frommherz, & Simon Meier-Vieracker. 2023. Keyness in song lyrics: Challenges of highly clumpy data. Journal for Language Technology and Computational Linguistics 36(1). 21–38.

Leech, Geoffrey, Paul Rayson, & Andrew Wilson. 2001. Word frequencies in written and spoken English: Based on the British National Corpus. London: Longman.

Leech, Geoffrey & Roger Fallon. 1992. Computer corpora – What do they tell us about culture? ICAME Journal 16. 29–50.

Lester, Nicholas A. 2017. The syntactic bits of nouns: How prior syntactic distributions affect comprehension, production, and acquisition. Ph.D. dissertation, University of California, Santa Barbara.

Lester, Nicholas A., Daniel Baum, & Tirza Biron. 2018. Phonetic duration of nouns depends on de-lexicalized syntactic distributions: Evidence from naturally occurring conversation. In Chuck Kalish, Martina Rau, Jerry Zhu, & Timothy Rogers (eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society, 2035–2040. Madison, WI.

Lester, Nicholas A., Laurie B. Feldman, & Fermín Moscoso del Prado Martín. 2017. You can take a noun out of syntax…: Syntactic similarity effects in lexical priming. In Glenn Gunzelmann, Andrew Howes, Thora Tenbrink, & Eddy Davelaar (eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society, 2537–2542. London, UK.

Lester, Nicholas A. & Fermín Moscoso del Prado Martín. 2017. Syntactic flexibility in the noun: evidence from picture naming. In Anna Papafragou, Daniel Grodner, Daniel Mirman, & John C. Trueswell (eds.), Proceedings of the 38th Annual Conference of the Cognitive Science Society, 2585–2590. Philadelphia, PA.

Liebetrau, Albert M. 1983. Measures of association. Beverly Hills, CA: Sage.

Lijffijt, Jefrey & Stefan Th. Gries. 2012. Correction to “Dispersions and adjusted frequencies in corpora”. International Journal of Corpus Linguistics 17(1). 147–149.

Lim, Zheng Wei, Harry Stuart, Simon De Deyne, Terry Regier, Ekaterina Vylomova, Trevor Cohn, & Charles Kemp. 2022. A computational approach to discovering cultural keywords across languages. PsyArXiv, last edited 22 Nov 2022.

Linzen, Tal & T. Florian Jaeger. 2015. Uncertainty and expectation in sentence processing: Evidence from subcategorization distributions. Cognitive Science 40(6). 1382–1411.

Linzen, Tal, Alec Marantz, & Liina Pylkkänen. 2013. Syntactic context in visual world recognition: An MEG study. The Mental Lexicon 8(2). 117–139.

Mahlberg, Michaela. 2008. Clusters, key clusters and local textual functions in Dickens. Corpora 2(1). 1–31.

McConnell, Kyla & Alice Blumenthal-Dramé. 2022. Effects of task and corpus-derived association scores on the online processing of collocations. Corpus Linguistics and Linguistic Theory 18(1). 33–76.

McDonald, Scott A. & Richard C. Shillcock. 2001. Rethinking the word frequency effect: The neglected role of distributional information in lexical processing. Language and Speech 44(3). 295–323.

McEnery, Anthony, Richard Xiao, & Yukio Tono. 2006. Corpus-based language studies: An advanced resource book. London & New York: Routledge.

Mehl, Seth. 2021. What we talk about when we talk about corpus frequency: The example of polysemous verbs with light and concrete senses. Corpus Linguistics and Linguistic Theory 17(1). 223–247.

Michelbacher, Lukas, Stefan Evert, & Hinrich Schütze. 2011. Asymmetry in corpus-derived and human word associations. Corpus Linguistics and Linguistic Theory 7(2). 245–276.

Mildenberger, Thoralf. 2023. Assessing keyness using permutation tests. arXiv: 2308.13383v1, last accessed 25 Aug 2023.

Milin, Petar, Dusica Filipović-Đurđević, D., & Fermín Moscoso del Prado Martín. 2009. The simultaneous effects of inflectional paradigms and classes on lexical recognition: Evidence from Serbian. Journal of Memory and Language 60(1). 50–64.

Milin, Petar, Victor Kuperman, Aleksandar Kostić, & R. Harald Baayen. 2009. Words and paradigms bit by bit: An information-theoretic approach to the processing of inflection and derivation. In James P. Blevins & Juliette Blevins (eds.), Analogy in grammar: Form and acquisition, 214–252. Oxford: Oxford University Press.

Millar, Neil & Brian S. Budgell. 2008. The language of public health – A corpus-based analysis. Journal of Public Health 16(5). 369–374.

Mollin, Sandra. 2009. Combining corpus linguistic and psychological data on word co-occurrences: Corpus collocates versus word associations. Corpus Linguistics and Linguistic Theory 5(2). 175–200.

Monroe, Burt L., Michael P. Colaresi, & Kevin M. Quinn. 2008. Fightin’ words: Lexical feature selection and evaluation for identifying the content of political conflict. Political Analysis 16(4). 372–403.

Monsell, Stephen. 1991. The nature and locus of word frequency effects in reading. In Derek Besner & Glyn W. Humphreys (eds.), Basic processes in reading: Visual word recognition, 148–197. Hillsdale, NJ: Lawrence Erlbaum Associates.

Moran, Matthew D. 2003. Arguments for rejecting sequential Bonferroni in ecological studies. OIKOS 100. 403–405.

Morrison, Catriona M., Andrew W. Ellis, & Philip T. Quinlan. 1992. Age of acquisition, not word frequency, affects object naming, not object recognition. Memory and Cognition 20. 705–714.

Mukherjee, Joybrato & Tobias Bernaisch. 2015. Cultural keywords in context: A pilot study of linguistic acculturation in South Asian Englishes. In Peter Collins (ed.), Grammatical change in English world-wide, 411–435. Amsterdam: John Benjamins.

Nakagawa, Shinichi. 2004. A farewell to Bonferroni: The problems of low statistical power and publication bias. Behavioral Ecology 15(6). 1044–1045.

Nelson, Robert. 2023. Too noisy at the bottom: Why Gries’ (2008, 2020) dispersion measures cannot identify unbiased distributions of words. Journal of Quantitative Linguistics 30(2). 153–166.

Nenadić, Filip, Petar Milin, & Benjamin V. Tucker. 2021. Relative entropy effects on the processing of spoken Romanian verbs. The Mental Lexicon 16(1). 23–48.

Oakes, Michael & Malcolm Farrow. 2007. Use of the chi-squared test to examine vocabulary differences in English language corpora representing seven different countries. Literary and Linguistic Computing 22(1). 85–99.

Oldfield, R. & A. Wingfield. 1965. Response latencies in naming objects. Quarterly Journal of Experimental Psychology A(17). 273–281.

Onnis, Luca, Padraic Monaghan, Morten H. Christiansen, & Nick Chater. 2004. Variability is the spice of learning, and a crucial ingredient for detecting and generalizing in nonadjacent dependencies. In Proceedings of the 26th Annual Meeting of the Cognitive Science Society, 1678–1683.

Paquot, Magali. 2010. Academic vocabulary in learner writing: From extraction to analysis. London & New-York, Continuum.

. 2013. Lexical bundles and transfer effects. International Journal of Corpus Linguistics 18(3). 391–417.

. 2014. Cross-linguistic influence and formulaic language: Recurrent word sequences in French learner writing. In Leah Roberts, Ineke Vedder, & Jan H. Hulstijn (eds.), Eurosla Yearbook, Vol. 14, 216–237. Amsterdam: John Benjamins.

. 2017. L1 frequency in foreign language acquisition: Recurrent word combinations in French and Spanish EFL learner writing. Second Language Research 33(1). 13–32.

Paquot, Magali & Yves Bestgen. 2009. Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction. In Andreas Jucker, Daniel Schreier, & Marianne Hundt (eds.), Corpora: Pragmatics and discourse, 247–269. Amsterdam: Rodopi.

Paulsen, Mikkel Ekeland. To appear. Assessing word commonness: Adding dispersion to frequency. International Journal of Corpus Linguistics.

Pecina, Pavel. 2010. Lexical association measures and collocation extraction. Language Resources and Evaluation 44(1–2). 137–158.

Pedersen, Ted. 1996. Fishing for exactness. In Proceedings of the South-Central SAS Users Group Conference (SCSUG-96), 27-29.10.1996, Austin, TX.

Pojanapunya, Punjaporn & Richard Watson Todd. 2018. Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis. Corpus Linguistics and Linguistic Theory 14(1). 133–167.

Rayner, Keith & Susan A. Duffy. 1986. Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity. Memory and Cognition 14(3). 191–201.

Rayson, Paul, Damon Berridge, & Brian J. Francis. 2004. Extending the Cochran rule for the comparison of word frequencies between corpora. In Gérald Purnelle, Cédrick Fairon, & Anne Dister (eds.), Le poids des mots: Proceedings of the 7th International Conference on Statistical analysis of textual data, Vol. II, 926–936. Louvain-la-Neuve: Presses Universitaires de Louvain.

Rayson, Paul & Amanda Potts. 2020. Analysing keyword lists. In Magali Paquot & Stefan Th. Gries (eds.), Practical handbook of corpus linguistics, 119–139. Berlin: Springer.

Resnik, Philip. 1996. Selectional constraints: An information-theoretic model and its computational realization. Cognition 61(1–2). 127–159.

Rogers, Phillip G. & Stefan Th. Gries. 2022. Grammatical gender disambiguates syntactically similar nouns. Entropy 24(4), 520.

Römer, Ute & Stefanie Wulff. 2008. Applying corpus methods to written academic texts: Explorations of MICUSP. Journal of Writing Research 2(2). 99–127.

Rosengren, Inger. 1971. The quantitative concept of language and its relation to the structure of frequency dictionaries. Études de Linguistique Appliquée (Nouvelle Série) 1. 103–127.

Savický, Petr & Jaroslava Hlaváčová. 2002. Measures of word commonness. Journal of Quantitative Linguistics 9(3). 215–231.

Schmid, Hans Joerg. 2010. Entrenchment, salience, and basic levels. In Dirk Geeraerts & Hubert Cuyckens (eds.), The Oxford handbook of cognitive linguistics, 117–138. Oxford: Oxford University Press.

Schooler, Lael J., & John R. Anderson. 1997. The role of process in the rational analysis of memory. Cognitive Psychology 32(3). 219–250.

Schneider, Ulrike. 2020. ΔP as a measure of collocation strength: Considerations based on analyses of hesitation placement. Corpus Linguistics and Linguistic Theory 16(2). 249–274.

Schuchardt, Hugo. 1885. Über die Lautgesetze: Gegen die Junggrammatiker. Berlin.

Scott, Mike. 1997. PC analysis of key words – And key words. System 25(2). 233–245.

Scott, Mike & Christopher Tribble. 2006. Textual patterns: Key words and corpus analysis in language education. Amsterdam: John Benjamins.

Seidenberg, Mark S. & Mayellen C. MacDonald. 1999. A probabilistic constraints approach to language acquisition and processing. Cognitive Science 23(4). 569–588.

Sheskin, David. 2011. Handbook of parametric and non-parametric statistical procedures. 5th ed. Boca Raton, FL: Taylor & Francis.

Shlens, Jonathon. 2014. Notes on Kullback-Leibler Divergence and Likelihood Theory. arXiv preprint, 1404.2000v1, 8 April 2014.

Sinclair, John M. 1996. The search for units of meaning. Textus 9(1). 75–106.

Siyanova-Chanturia, Anna. 2015. Collocation in beginner learner writing: A longitudinal study. System 53. 148–160.

Sönning, Lukas. 2024. Evaluation of keyness metrics: Performance and reliability. Corpus Linguistics and Linguistic Theory 20(2). 263–288.

Spärck Jones, Karen. 1972. A statistical interpretation of term specificity and its application in information retrieval. Journal of Documentation 28(1). 11–21.

Stefanowitsch, Anatol & Stefan Th. Gries. 2003. Collostructions: Investigating the interaction between words and constructions. International Journal of Corpus Linguistics 8(2). 209–243.

Stubbs, Michael. 1995. Collocations and semantic profiles: On the cause of the trouble with quantitative methods. Functions of Language 2(1). 23–55.

. 1996. Text and corpus analysis: Computer-assisted studies of language and culture. Oxford: Blackwell.

Suethanapornkul, Sakol & Sarut Supasiraprapa. To appear. Usage events and constructional knowledge: A study of two variants of the introductory-it construction. Studies in Second Language Acquisition.

Sun, Hao & Jean-Pierre Koenig. 2017. There are more valence alternations than the ditransitive. In Julia Nee, Margaret Cychosz, Dmetri Hayes, Tyler Lau, & Emily Remirez (eds.), Proceedings of the 43rd Meeting of the Berkeley Linguistics Society, 291–308. Berkeley, CA: Berkeley Linguistics Society.

Tomokiyo, Takashi & Matthew Hurst. 2003. A language model approach to keyphrase extraction. In Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, 33–40. Stroudsbury, PA.

Tribble, Christopher. 2002. Small corpora and teaching writing: Towards a corpus-informed pedagogy of writing. In Mohsen Ghadessy, Alex Henry, & Robert L. Roseberry (eds.), Small corpus studies and ELT: Theory and practice, 381–408. Amsterdam: John Benjamins.

Tucker, Benjamin V., Daniel Brennerm, D. Kyle Danielson, Matthew C. Kelley, Filip Nenadić, & Michelle Sims. 2019. The Massive Auditory Lexical Decision (MALD) database. Behavior Research Methods 51. 1187–1204.

van Heuven, Walter J. B., Pawel Mandera, Emmanuel Keuleers, & Marc Brysbaert. 2014. SUBTLEX-UK: A new and improved word frequency database for British English. The Quarterly Journal of Experimental Psychology 67(6). 1176–1190.

VanPatten, Bill, Jessica Williams, Gregory D. Keating, & Stefanie Wulff. 2020. Introduction: The nature of theories. In Bill VanPatten, Gregory D. Keating, & Stefanie Wulff (eds.), Theories in second language acquisition: An introduction, 1–17. New York, NY: Routledge.

Weisberg, Herbert F. 1974. Models of statistical relationship. The American Political Science Review 68(4). 1638–1655.

Wettler, Manfred, Reinhard Rapp, & Peter Sedlmeier. 2005. Free word associations correspond to contiguities between words in texts. Journal of Quantitative Linguistics 12(2–3). 111–122.

Wilcox, Allen R. 1973. Indices of qualitative variation and political measurement. The Western Political Quarterly 26(2). 325–343.

Zhai, Chengxiang & John Lafferty. 2004. A study of smoothing methods for language models. ACM Transactions on Information Systems 22(2). 179–214.

Zipf, George K. 1935. The psycho-biology of language. Boston, MA: Houghton Mifflin Harcourt.