Investigating effects of criterial consistency, the diversity dimension, and threshold variation in formulaic language research: Extending the methodological considerations of O’Donnell et al. (2013)

Lu, Xiaofei; Kisselev, Olesya; Yoon, Jungwan; Amory, Michael D.

doi:10.1075/ijcl.16086.lu

Article published In:

International Journal of Corpus Linguistics
Vol. 23:2 (2018) ► pp.158–182

Investigating effects of criterial consistency, the diversity dimension, and threshold variation in formulaic language research

Extending the methodological considerations of O’Donnell et al. (2013)

Xiaofei Lu | The Pennsylvania State University

Olesya Kisselev | The Pennsylvania State University

Jungwan Yoon | The Pennsylvania State University

Michael D. Amory | The Pennsylvania State University

O’Donnell et al. (2013) considered four measures of formulaicity and reported that they produced different results concerning the effects of expertise and first/second language status on formulaic sequence usage in academic writing. The current study explores several additional methodological issues using the same dataset from O’Donnell et al. (2013). We first motivate the need for criterial consistency and investigate whether frequency- and association-based measures yield different results when they are both obtained using corpus-internal criteria. The informativeness of the diversity dimension of formulaic sequence use is then gauged by comparing the results of phrase-frame type-token ratio against those of other measures. Finally, we profile formulaic sequence distribution across quartiles of different measures to assess the effect of variable measure thresholds. Our findings highlight the criticality of issues of criterial consistency, formulaic sequence diversity, and threshold variation in formulaic language research.

Keywords: formulaic language, n-gram frequency, mutual information, phrase frames, phrase-frame type-token ratio

Article outline

1.Introduction
2.Methodological issues in formulaic sequence identification and extraction
3.Motivation for the current study
4.Method
- 4.1Data
- 4.2Measures
  - 4.2.1N-gram frequency
  - 4.2.2N-gram MI
  - 4.2.3P-frame frequency
  - 4.2.4P-frame TTR
- 4.3Procedure
5.Results
- 5.1Research question 1: Corpus-internal vs. corpus-external MI thresholds
- 5.2Research questions 2 and 3: Effects of expertise
  - 5.2.1Frequency-based n-grams
  - 5.2.2MI-defined formulas
  - 5.2.3P-frames
  - 5.2.4P-frame TTR
- 5.3Research questions 2 and 3: Effects of L1/L2 status
  - 5.3.1Frequency-based n-grams
  - 5.3.2MI-defined formulas
  - 5.3.3P-frames
  - 5.3.4P-frame TTR
6.Discussion
7.Conclusions
Acknowledgements
Notes
References

Published online: 5 October 2018

https://doi.org/10.1075/ijcl.16086.lu

References

Bannard, C., & Lieven, E.

(2012) Formulaic language in L1 acquisition. Annual Review of Applied Linguistics, 321, 3–16.

Biber, D.

(2006) University Language: A Corpus-Based Study of Spoken and Written Registers. Amsterdam/Philadelphia: John Benjamins.

(2009) A corpus-driven approach to formulaic language in English: Multi-word patterns in speech and writing. International Journal of Corpus Linguistics, 14(3), 275–311.

Biber, D., Conrad, S., & Cortes, V.

(2004) If you look at …: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), 371–405.

Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E.

(1999) The Longman Grammar of Spoken and Written English. New York/London: Longman.

Conklin, K., & Schmitt, N.

(2012) The processing of formulaic language. Annual Review of Applied Linguistics, 321, 45–61.

Cortes, V.

(2004) Lexical bundles in published and student disciplinary writing: Examples from history and biology. English for Specific Purposes, 23(4), 397–423.

Coxhead, A.

(2000) A new academic word list. TESOL Quarterly, 34(2), 213–238.

Durrant, P., & Doherty, A.

(2010) Are high-frequency collocations psychologically real? Investigating the thesis of collocational priming. Corpus Linguistics and Linguistic Theory, 6(2), 125–155.

Ellis, N. C.

(2012) Formulaic language and Second Language Acquisition: Zipf and the phrasal Teddy Bear. Annual Review of Applied Linguistics, 321, 17–44.

Eskildsen, S. W.

(2009) Constructing another language – Usage-based linguistics in second language acquisition. Applied Linguistics, 30(3), 335–357.

Eskildsen, S. W., & Cadierno, T.

(2007) Are recurring multi-word expressions really syntactic freezes? Second language acquisition from the perspective of usage-based linguistics. In M. Nenonen & S. Niemi (Eds.), Collocations and Idioms 1: Papers from the First Nordic Conference on Syntactic Freezes (pp. 86–99). Joensuu: Joensuu University Press.

Evert, S.

(2008) Corpora and collocations. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics. An International Handbook (pp. 1212–1248). Berlin: Mouton de Gruyter.

Fletcher, W. H.

(2007) KfNgram [Computer software]. Annapolis, MD: USNA.

Granger, S.

(1996) From CA to CIA and back: An integrated approach to computerized bilingual and learner corpora. In K. Aijmer, B. Altenberg & M. Johansson (Eds.), Languages in Contrast: Paper from a Symposium on Text-based Cross-linguistic Studies (pp. 37–51). Lund: Lund University Press.

(2003) The International Corpus of Learner English: A new resource for foreign language learning and teaching and second language acquisition research. TESOL Quarterly, 37(3), 538–546.

Granger, S., & Meunier, F.

(2008) Phraseology: An Interdisciplinary Perspective. Amsterdam/Philadelphia: John Benjamins.

Gries, S., & Wulff, S.

(2005) Do foreign language learners also have constructions? Evidence from priming, sorting, and corpora. Annual Review of Cognitive Linguistics, 31, 182–200.

Herbst, T.

(2011) Choosing sandy beaches – collocations, probabemes and the idiom principle. In T. Herbst, S. Faulhaber & P. Uhrig (Eds.), The Phraseological View of Language (pp. 27–57). Berlin: Walter de Gruyter.

Hyland, K.

(1998) Hedging in Scientific Research Articles. Amsterdam/Philadelphia: John Benjamins.

(2012) Bundles in academic discourse. Annual Review of Applied Linguistics, 321, 150–169.

Laufer, B., & Nation, P.

(1995) Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics, 16(3), 307–322.

Lieven, E., & Tomasello, M.

(2008) Children’s ﬁrst language acquisition from a usage-based perspective. In P. Robinson & N. C. Ellis (Eds.), Handbook on Cognitive Linguistics and Second Language Acquisition (pp. 168–196). New York, NY: Routledge.

McEnery, T., & Hardy, A.

(2014) Corpus Linguistics: Method, Theory and Practice. Cambridge: Cambridge University Press.

Manning, C., & Schütze, H.

(1999) Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.

McEnery, T., & Wilson, A.

(2004) Corpus Linguistics: An Introduction. Edinburgh: Edinburgh University Press.

Mel’čuk, I.

(1998) Collocations and lexical functions. In A. P. Cowie (Ed.), Phraseology: Theory, Analysis, and Applications (pp. 23–53). Oxford: Clarendon Press.

Nesselhauf, N.

(2005) Collocations in a Learner Corpus. Amsterdam/Philadelphia: John Benjamins.

O’Donnell, M., Römer, U., & Ellis, N. C.

(2013) The development of formulaic sequences in first and second language writing: Investigating effects of frequency, association, and native norm. International Journal of Corpus Linguistics, 18(1), 83–108.

Paquot, M. B., & Granger, S.

(2012) Formulaic language in learner corpora. Annual review of Applied Linguistics, 321, 130–149.

Pawley, A., & Syder, F. H.

(1983) Two puzzles for linguistic theory: Nativelike selection and nativelike fluency. In J. Richards & R. Schmidt (Eds.), Language and Communication (pp. 191–225). New York/London: Longman.

Pivovarova, L., Kormacheva, D., & Kopotev, M.

(2017) Evaluation of collocation extraction methods for the Russian language. In M. Kopotev, O. Lyashevskaja & A. Mustajoki (Eds.), Quantitative Approaches to the Russian Language. New York, NY: Routledge.

Römer, U.

(2010) Establishing the phraseological profile of a text type: The construction of meaning in academic book reviews. English Text Construction, 3(1), 95–119.

Römer, U., & O’Donnell, M. B.

(2011) From student hard drive to web corpus (part 1): The design, compilation and genre classification of the Michigan Corpus of Upper-level Student Papers (MICUSP). Corpora, 6(2), 159–177.

Schmitt, N., & Carter, R.

(2004) Formulaic sequences in action. An introduction. In N. Schmitt (Ed.), Formulaic Sequences (pp. 2–22). Amsterdam/Philadelphia: John Benjamins.

Scott, M.

(2014) WordSmith Tools 6.0 [Computer software]. Liverpool: Lexical Analysis Software.

Simpson-Vlach, R., & Ellis, N. C.

(2010) An academic formulas list: New methods in phraseology research. Applied Linguistics, 31(4), 487–512.

Sinclair, J.

(1991) Corpus Concordance Collocation. Oxford: Oxford University Press.

Tomasello, M.

(2003) Constructing a Language: A Usage-Based Theory of Language Acquisition. Cambridge, MA: Harvard University Press.

Upton, G., & Cook, I.

(1996) Understanding Statistics. Oxford: Oxford University Press.

Wood, D.

(2015) Fundamentals of Formulaic Language: An Introduction. London: Bloomsbury Publishing.

Wray, A.

(2002) Formulaic Language and the Lexicon. Cambridge: Cambridge University Press.

(2008) Formulaic Language: Pushing the Boundaries. Oxford: Oxford University Presss.

Cited by

Cited by 4 other publications

Order by:

Lang, Juanjuan

2024. Research on Language Evolution and Language Diversity Based on Chinese Speech Pitch Deviation Features. Applied Mathematics and Nonlinear Sciences 9:1

Lu, Xiaofei & Renfen Hu

2021. Sense-aware lexical sophistication indices and their relationship to second language writing quality. Behavior Research Methods 54:3 ► pp. 1444 ff.

Pan, Fan, Randi Reppen & Douglas Biber

2020. Methodological issues in contrastive lexical bundle research. International Journal of Corpus Linguistics 25:2 ► pp. 216 ff.

Szudarski, Paweł

2023. Collocations, Corpora and Language Learning,

This list is based on CrossRef data as of 1 june 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.