The New Statistics for applied linguistics
The New Statistics is an approach to scholarly research which offers an alternative to the problematic overreliance on
significance testing currently plaguing the research literature. This paper describes the problems associated with significance testing and
introduces the key concepts of the data-analysis that best fits with the goals of the New Statistics: estimation of effect sizes and
confidence intervals. These concepts will be applied in a reanalysis of the summary data from an article that was recently published in this
journal. This makes it possible to compare the estimation approach advocated by the New Statistics to the standard significance tests and to
discuss potential drawbacks of this approach as a means of gathering quantitative evidence in support of our substantive hypotheses.
Article outline
- 1.Introduction
- 2.Does it feel non-native?
- 3.The New Statistics
- 3.1An estimate of the population effect size and the confidence interval
- 4.Interpreting confidence intervals
- 5.Conclusion
- Acknowledgements
- Notes
-
References
References (52)
References
American Psychological Association. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: American Psychological Association.
Amrhein, V., Greenland, S., & McShane, B. (2019). Retire statistical significance. Nature, 5671, 305–307.
Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 661, 423–437.
Berger, O. J., & Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of P Values and evidence. Journal of the American Statistical Association, 821, 112–122.
Berkson, J. (1942). Tests of significance considered as evidence. Journal of the American Statistical Association, 371, 325–335.
Calin-Jageman, R. J., & Cumming, G. (2019a). The New Statistics for better science: Ask how much, how uncertain, and what else is known. The American Statistician, 701, 271–280.
Calin-Jageman, R. J., & Cumming, G. (2019b). Estimation for better inference in neuroscience. ENeuro, 61, 1–11.
Carver, R. P. (1978). The case against significance testing. Harvard Educational Review, 481, 378–399.
Chambers, C. (2018). The seven deadly sins of psychology. A manifesto for reforming the culture of scientific practice. Princeton/Oxford: Princeton University Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd edition). New York, NY: Academic Press.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 491, 997–1003.
Cumming, G. (2012). Understanding the New Statistics. Effect sizes, confidence intervals, and meta-analysis. New York/London: Routledge.
Cumming, G. (2014). The New Statistics: Why and how. Psychological Science, 251, 7–29.
Cumming, G. & Calin-Jageman, R. J. (2017). Introduction to the New Statistics. Estimation, open science, & beyond. New York/London: Routledge.
Falk, R., & Greenbaum, C. W. (1995). Significance tests die hard: The amazing persistence of a probabilistic misconception. Theory & Psychology, 51, 75–98.
Field, A. (2015). Discovering statistics using IBM SPSS Statistics (4th ed.). London: Sage.
Gigerenzer, G. (2004). Mindless statistics. Journal of Socio-Economics, 331, 587–606.
Gigerenzer, G., & Marewski, J. N. (2015). Surrogate science: The idol of a universal method for scientific inference. Journal of Management, 411, 421–440.
Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J., & Kruger, L. (1989). The Empire of Chance. Cambridge: Cambridge University Press.
Hacking, I. (1965). Logic of statistical inference. Cambridge: Cambridge University Press.
Haller, H., & Krauss, S. (2002). Misinterpretations of significance. A problem students share with their teachers? Methods of Psychological Research Online, 71, [URL]
Hubbard, R. (2004). Alfabet soup: Blurring the distinctions between p’s and α’s in psychological research. Theory &Psychology, 141, 295–327.
Hubbard, R., & Lindsay, R. M. (2008). Why P values are not a useful measure of evidence in statistical significance testing. Theory & Psychology, 181, 69–88.
Kelley, K. (2007). Methods for the behavioral, educational, and social sciences: An R package. Behavior Research Methods, 391, 979–384.
Kline, R. B. (2013). Beyond significance testing. Statistics reform in the behavioral sciences. Washington, DC: American Psychological Association.
Kruschke, J. K. (2015). Doing Bayesian data analysis. A tutorial with R, Jags, and Stan (2nd ed.). London: Academic Press.
Kruschke, J. K., & Liddell, T. M. (2018). The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin Review, 251, 178–206.
Lambdin, C. (2012). Significance tests as sorcery: Science is empirical – significance tests are not. Theory & Psychology, 221, 67–90.
Lindley, D. V. (2000). The philosophy of statistics. The Statistician, 491, 293–337.
Ly, A., Raj, A., Etz, A., Marsman, M., Gronau, Q. F., & Wagenmakers, E. J. (2018). Bayesian reanalyses from summary statistics: A guide for academic consumers. Advances in Methods and Practices in Psychological Science, 11, 367–374.
Maxwell, S. E., Delaney, H. D., & Kelley, K. (2017). Designing experiments and analyzing data. A model comparison perspective (3th ed.). New York, NY: Routledge.
McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L. (2019). Abandon statistical significance. The American Statistician, 731, 235–245.
Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 461, 806–834.
Meehl, P. E. (1997). The problem is epistemology, not statistics: Replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 393–425). Mahwah, NJ: Erlbaum.
Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D., & Wagenmakers, E. J. (2016). The fallacy of placing confidence in confidence intervals. Psychonomic Bulletin Review, 231, 103–123.
Mulder, G. (2016). De kwaliteit van onderzoek. Dichotoom denken versus meta-analytisch denken. Tijdschrift voor Taalbeheersing, 381, 163–173.
Mulder, G. (2019). Een significant probleem. Tijdschrift voor Taalbeheersing, 411, 203–213.
Neyman, J. (1977). Frequentist probability and frequentist statistics. Synthese, 361, 97–131.
Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 51, 241–301.
Norouzian, R., De Miranda, M., & Plonksy, L. (2018). The Bayesian revolution in second language research: An applied approach. Language Learning, 681, 1032–1075.
Oakes, M. (1986). Statistical significance. New York, NY: Wiley.
Perezgonzalez, J. D. (2015). Fisher, Neyman-Pearson, or NHST? A tutorial for teaching data testing. Frontiers in Psychology, 61, 1–11.
Polya, G. (1954). Mathematics and plausible reasoning, V1–2. Induction and analogy in mathematics, patterns of plausible inference. Princeton, NJ: Princeton University Press.
Roozeboom, W. W. (1960). The fallacy of the null-hypothesis significance test. Psychological Bulletin, 571, 416–428.
Rosenthal, R., Rosnow, R. L., & Rubin, D. B. (2000). Contrasts and effect sizes in behavioral research. A correlational approach. Cambridge, UK: Cambridge University Press.
Rouder, J. N., Speckman, P. L., Dongchu, S., Morey, R. D., & Iversen, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin and Review, 161, 225–237.
Schmidt, F. L. (1996). Statistical significance and cumulative knowledge in psychology: Implications for the training of researchers. Psychological Methods, 11, 115–129.
Wasserstein, R. L., & Lazar, N. (2016). The ASA’s statement on P-values: Context, process, and purpose. The American Statistician, 701, 129–133.
Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “p < .05”. The American Statistician, 731, 1–19.
Wiens, S., & Nilsson, M. E. (2017). Performing contrast analysis in factorial designs: From NHST to confidence intervals and beyond. Educational and Psychological Measurement, 771, 690–715.
Zilliak, S. T., & McClosky, D. N. (2008). The cult of statistical significance. How the standard error costs us jobs, justice, and lives. Ann Arbor, MN: The University of Michigan Press.
Cited by (3)
Cited by three other publications
Müller, Marcus
2024.
Einsam oder gemeinsam?.
Zeitschrift für Literaturwissenschaft und Linguistik 54:2
► pp. 151 ff.
Sönning, Lukas & Valentin Werner
2021.
The replication crisis, scientific revolutions, and linguistics.
Linguistics 59:5
► pp. 1179 ff.
van Tessel, Evi & Marco Bril
This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.