Article published In:
Linguistic Approaches to Bilingualism: Online-First ArticlesNormalization of timed measures in bilingualism research
Make it optimal with the Box-Cox transformation
The time it takes an individual to respond to a probe (e.g., a word, picture, or question) or to read a word or
phrase provides useful insights into cognitive processes. Consequently, timed measures are a staple in bilingualism research.
However, timed measures usually violate assumptions of linear models, one being normal distribution of the residuals. Power
transformations are a common solution but which of the many possible transformations to apply is often guesswork. Box and Cox (1964) developed a procedure to estimate the best-fitting normalizing
transformation, coefficient lambda (λ), that is easy to run using standard R packages. This practical primer demonstrates how to
perform the Box-Cox transformation in R using as a testbed the distractor items from a recent eye-tracking study on sentence
reading in speakers of Spanish as a majority and a heritage language. The analyses show (a) that the exponents selected via the
Box-Cox procedure reduce positive skewness as well as or better than the natural log; (b) that the best-fitting value of λ varies
based on factors such as group and, in the case of eye-movement data, the measure of interest; and (c) that the choice of
transformation sometimes impacts p values for model estimates.
Keywords: Box-Cox transformation, reading times, response times, outliers, skewness
Article outline
- 1.Introduction
- 1.1Data normalization
- 1.2The Box-Cox transformation
- 2.Performing the Box-Cox transformation in R
- 2.1The sample data
- 2.2R packages and code
- 3.Distributions before and after transformation
- 4.Model comparisons
- 5.Conclusion
- Data availability
- Acknowledgements
-
References
Published online: 25 September 2024
https://doi.org/10.1075/lab.24017.kea
https://doi.org/10.1075/lab.24017.kea
References (18)
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random
effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and
Language,
68
(3), 255–278.
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting
linear mixed-effects models using lme4. Journal of Statistical
Software,
67
(1), 1–48.
Box, G. E. P., & Cox, D. R. (1964). An
analysis of transformations. Journal of the Royal Statistical Society. Series B
(Methodological),
26
1, 211–252.
Burchill, Z. J., & Jaeger, T. F. (2024). How
reliable are standard reading time analyses? Hierarchical bootstrap reveals substantial power over-optimism and
scale-dependent Type I error inflation. Journal of Memory and
Language,
136
1, Article 104494.
Cuetos, F., Glez-Nosti, M., Barbón, A., & Brysbaert, M. (2011). SUBTLEX-ESP:
Spanish word frequencies based on film
subtitles. Psicológica,
32
(2), 133–143. [URL]
Drummer, J.-D., & Felser, C. (2018). Cataphoric
pronoun resolution in native and non-native sentence comprehension. Journal of Memory and
Language,
101
1, 97–113.
Keating, G. D. (2022). The
effect of age of onset of bilingualism on gender agreement processing in Spanish as a heritage
language. Language
Learning,
72
(4), 1170–1208.
(2024). Morphological
markedness and the temporal dynamics of gender agreement processing in Spanish as a majority and a heritage
language. Language Learning. Advance online
publication.
Nicklin, C., & Plonsky, L. (2020). Outliers
in L2 research in applied linguistics: A synthesis and data re-analysis. Annual Review of
Applied
Linguistics,
40
1, 26–55.
Osborne, J. (2010). Improving
your data transformations: Applying the Box-Cox transformation. Practical Assessment, Research,
and Evaluation,
15
1, Article 12. [URL]
R Core Team (2023). R: A language and
environment for statistical computing (Version 4.3.1) [Computer
software]. R Foundation for Statistical Computing. Retrieved
from [URL]
Ratcliff, R. (1993). Methods
for dealing with reaction time outliers. Psychological
Bulletin,
114
(3), 510–532.
Rayner, K. (1998). Eye
movements in reading and information processing: 20 years of research. Psychological
Bulletin,
124
(3), 372–422.
Revelle, W. (2024). Psych:
Procedures for psychological, psychometric, and personality research. Northwestern University, Evanston, Illinois.
SR Research (2005). EyeLink
1000 [Apparatus and software]. [URL]