Two subjunctives or three?
A multimodel analysis of subjunctive tense variation in complement clauses in Spanish
This paper examines the use of the three non-periphrastic subjunctives in Spanish in embedded clauses under obligatory subjunctive predicates in the past tense in three Spanish varieties: Argentinean, Mexican and Peninsular Spanish. By means of random forest and logistic regression analyses, I demonstrate that a grammar where the two “past” subjunctives make up one group, such that the variation can be modeled on a binary opposition between (morphologically) past vs. (morphologically) present, achieves better prediction accuracy and goodness-of-fit parameters than a grammar with a three-way split. The results suggest that, at least in complement clauses of obligatory subjunctive predicates, there appear to be no semantic differences between the two past subjunctives but there are still relatively large differences in how the three subjunctive forms are used across the three Spanish varieties studied.
Article outline
- 1.Introduction
- 2.The Spanish subjunctive
- 2.1Concordantia temporum or sequence of tense in the subjunctive
- 2.2The evolution of the two past subjunctives
- 2.3Previous accounts of the distinction between -se and -ra
- 3.Data and methodology
- 3.1The data
- 3.2Statistical analysis
- 4.Results
- 4.1Descriptive statistics
- 4.2Random forest models
- 4.2.1The multiclass model
- 4.2.2The binary model
- 4.3Regression analyses
- 4.3.1The multiclass logistic regression
- 4.3.2The binary logistic regression
- 5.Discussion
- 5.1The main findings
- 5.2The present subjunctive in Argentinean Spanish
- 6.Conclusion
- Notes
-
References
References (56)
References
Baayen, R. H., Hendrix, P., & Ramscar, M. (2011a, January 6–9). Sidestepping the combinatorial explosion: Towards a processing model based on discriminative learning [Paper presentation]. Annual Meeting of the Linguistic Society of America. Pittsburgh, USA.
Baayen, R. H., Milin, P., Filipovic Durffevic, D., Hendrix, P., & Marelli, M. (2011b). An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review, 118(3). 438.
Baayen, R. H., Endresen, A., Janda, L. A., Makarova, A., & Nesset, T. (2013). Making choices in Russian: Pros and cons of statistical methods for rival forms. Russian Linguistics, 37(3), 253–291.
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2014). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 671, 1–48.
Bello, A., & Cuervo, R. J. (1970). Gramática de la lengua castellana [A Grammar of the Spanish Language]. Sopena Argentina.
Branco, P., Ribeiro, R. P., & Torgo, L. (2016). UBL: An R Package for Utility-Based Learning [Computer software]. [URL]
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Bybee, J., & Thompson, S. (2000). Three frequency effects in syntax. Berkeley Linguistics Society, 23(1), 378–388.
Carrasco Gutierrez, A. (1998). La correlación de tiempos en español [Sequence of Tense in Spanish]. Universidad Complutense de Madrid dissertation.
Chawla, N. V., Japkowicz, N., & Kotcz, A. (2004). Special issue on learning from imbalanced data sets. ACM Sigkdd Explorations Newsletter, 6(1), 1–6.
Comrie, B. (1985). Tense. Cambridge University Press.
Crespo del Río, C. (2014). Tense and Mood Cariation in Spanish Nominal Subordinates: The Case of Peruvian Varieties [Doctoral dissertation, University of Illinois at Urbana-Champaign). IDEALS. [URL]
Davies, M. (2016). Corpus del Español/ Web Dialects 2 billion words. Available online at [URL]
Davis, J., & Goadrich, M. (2006). The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning (pp. 233–240). ACM.
Day, M. (2011 June, 21). Variation in the use of the –ra and –se forms of the imperfect subjunctive in Modern Spoken Peninsular Spanish [Paper presentation]. NWAV 40, Georgetown University.
Debeer, D., & Strobl, C. (2019). permimp: (Conditional) Permutation Importance (R package version 0.1–01) [Computer software]. [URL]
DeMello, G. (1993). –ra vs. –se subjunctive: A new look at an old topic. Hispania, 76(2), 235–243.
Fox, J. (1987). Effect displays for generalized linear models. Sociological Methodology, 171, 347–361.
Fox, J. (2003). Effect displays in R for generalised linear models. Journal of Statistical Software, 8(15), 1–27.
Fox, J., & Hong, J. (2009). Effect displays in R for multinomial and proportional-odds logit models: Extensions to the effects package. Journal of Statistical Software, 32(1), 1–24.
Fox, J., & Weisberg, S. (2018). Visualizing fit and lack of fit in complex regression models with predictor effect plots and partial residuals. Journal of Statistical Software, 87(9), 1–27.
Fox, J., & Weisberg, S. (2019). An R Companion to Applied Regression (3rd ed.). Sage. [URL]
García, V., Mollineda, R. A., & Sánchez, J. S. (2010). Theoretical analysis of a performance measure for imbalanced data. In 20th International Conference on Pattern Recognition (pp. 617–620). IEEE. [URL].
Gili Gaya, S. (1983). Curso superior de sintaxis española [Advanced Course on Spanish Syntax]. Colton Book Imports.
Goldberg, A. E. (1995). Constructions: A Construction Grammar Approach to Argument Structure. The University of Chicago Press.
Guajardo, G., & Goodall, G. (2019). On the status of concordantia temporum in Spanish: An experimental approach. Glossa, 4(1), 116.
He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1322–1328. IEEE. [URL]
He, H. & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge & Data Engineering, 21(9), 1263–1284.
Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A., & Van Der Laan, M. J. (2005). Survival ensembles. Biostatistics, 7(3), 355–373.
Krawczyk, B. (2016). Learning from imbalanced data: Open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221–232.
Laca, B. (2010). The puzzle of subjunctive tenses. In R. Box-Bennema, B. Kampers-Manhe, & B. Hollebrandse (Eds.), Romance Languages and Linguistic Theory 2008: Selected Papers from ‘Going Romance’ Groningen 2008 (pp. 77–104). John Benjamins.
Lapesa, R. (1997). Historia de la Lengua Española [History of the Spanish Language]. Biblioteca Románica Hispánica.
Lathrop, T. A. (1980). The Evolution of Spanish. Juan de la Cuesta.
Lopez Samaniego, A., & Kempas, I. (2018). Querría que me lo compruebes/comprobaras/comprobases: Verb tense choice after expressions of attenuated volition in European Spanish. Estudios Filologicos, 611, 35–58.
López, V., Fernández, A., García, S., Palade, V., & Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 2501, 113–141.
Naranjo, M. G. (2017). The se-ra alternation in Spanish subjunctive. Corpus Linguistics and Linguistic Theory, 13(1), pp.97–134.
Olson, D. L., & Delen, D. (2008). Performance evaluation for predictive modeling. In Advanced Data Mining Techniques (pp. 137–147). Springer.
Penny, R. (1991). A History of the Spanish Language. Cambridge University Press.
Picallo, C. (1984). “El nudo FLEX y el parámetro del sujeto nulo” [The IP and pro-drop parameter]. In I. Bosque (Ed), Indicativo y subjuntivo [Indicative and Subjunctive] (pp. 202–233). Taurus.
Provost, F. (2000). Machine learning from imbalanced data sets 101. In Proceedings of the AAAI’2000 Workshop on Imbalanced Data Sets. AAAI Press. [URL]
Quer, J. (1998). Mood at the Interface. Holland Academic Graphics.
R Core Team. (2019). R: A language and environment for statistical computing (Version 3.6.1) [Computer software]. R Foundation for Statistical Computing. [URL]
Raeder, T., Forman, G., & Chawla, N. V. (2012). Learning from imbalanced data: Evaluation matters. In D. E. Holmes & J. C. Lakhmi (Eds.), Data Mining: Foundations and Intelligent Paradigms (pp. 315–331). Springer.
Rosemeyer, M., & Schwenter, S. A. (2019). Entrenchment and persistence in language change: The Spanish past subjunctive. Corpus Linguistics and Linguistic Theory, 15(1), 167–204.
Sessarego, S. (2008). Spanish concordantia temporum: An old issue, new solutions. In M. Westmoreland & J. A. Thomas (Eds.), Selected Proceedings of the 4th Workshop on Spanish Sociolinguistics (pp. 91–99). Cascadilla Proceedings Project. [URL]
Sessarego, S. (2010). Temporal concord and Latin American Spanish dialects: A genetic blueprint. Revista Iberoamericana de Lingüística, 51, 137–169.
Strobl, C., Boulesteix, A. L., Zeileis, A., & Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics, 8(1).
Strobl, C., Boulesteix, A. L., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9(1).
Suñer, M., & Padilla-Rivera, J. (1987). Sequence of tenses and the subjunctive. Hispania, 70(3), 634–642.
Tharwat, A. (2020). Classification of assessment methods. Applied Computing and Informatics. Advance online publication.
Venables, W. N., & Ripley, B. D. (2002). Random and mixed effects. In Modern Applied Statistics with S (pp. 271–300). Springer.
Wallace, B. C., & Dahabreh, I. J. (2012). Class probability estimates are unreliable for imbalanced data (and how to fix them). In Institute of Electrical and Electronics Engineers (IEEE) 12th International Conference on Data Mining (International Conference on Data Mining) (pp. 695–704). IEEE Computer Society.
Wurmbrand, S. (2014). Tense and aspect in English infinitives. Linguistic Inquiry, 45(3), 403–447.
Cited by (1)
Cited by one other publication
This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.