Cluster analysis is an exploratory data analysis technique, encompassing a number of different algorithms and methods for sorting objects into groups. Cluster analysis requires the analyst to make choices about dissimilarity measures, grouping algorithms, etc., and these choices are difficult to make without an understanding of their theoretical implications and a very good understanding of the data. This chapter provides an introduction to the distance measures and clustering algorithms most commonly used for cluster analytic work. Different from Baayen (2008), Johnson (2008) and Gries (2009), its main aim is to equip the researcher with at least a basic understanding of what is happening behind the scenes when a dataset is explored with the help of a particular cluster analytic technique.
Alviar, J.J. (2008). Recent advances in computational linguistics and their application to biblical studies. New Testament Studies, 54(1),139–159
Baayen, R.H. (2008). Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press.
Backhaus, K., Erichson, B., Plinke, W., & Weiber, R. (1996). Multivariate Analysemethoden: Eine anwendungsorientierte Einführung.
8th edition
. Berlin; Heidelberg; New York: Springer.
Brock, G., Pihur, V., Datta, S., & Datta, S. (2011). clValid: Validation of clustering results. Journal of Statistical Software, 25(4), March 2008. R package version 0.6-2. [URL].
Divjak, D., & Gries, St. Th. (2006). Ways of trying in Russian: Clustering behavioral profiles. Journal of Corpus Linguistics and Linguistic Theory, 2(1), 23–60.
Everitt, B.S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis.
5th edition
. Oxford: Wiley.
Gower, J., & Legendre, P. (1986). Metric and Euclidean properties of dissimilarity coefficients. Journal of Classification, 3(1), 5–48.
Gries, St. Th. (2009). Statistics for linguistics with R: A practical introduction. Berlin: Mouton de Gruyter.
Harnad, S. (2005). To cognize is to categorize: Cognition is categorization. In C. Lefebvre & H. Cohen (Eds.), Handbook on categorization (pp. 19–43). Oxford & London: Elsevier.
Hennig, C. (2010). fpc: Flexible procedures for clustering. R package version 2.0-3. [URL].
Johnson, K. (2008). Quantitative methods in linguistics. New York: Wiley-Blackwell.
Kaufman, L., & Rousseeuw, P.J. (1990). Finding groups in data: An introduction to cluster analysis (Series in Applied Probability and Statistics). New York: Wiley-Blackwell.
Milligan, G.W., & Cooper, M.C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50, 159–179.
R Development Core Team (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. [URL].
Rousseeuw, P.J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20(1), 53–65.
Shaw, D. (1974). Statistical analysis of dialectal boundaries. Computers and the Humanities, 8, 173–177.
Suzuki, R., & Shimodaira, H. An R package for hierarchical clustering with p-values. Retrieved from [URL] [Accessed 25 May 2012].
Tryon, R.C. (1939). Cluster analysis. New York: McGraw-Hill.
2024. The colexification of vision and cognition in Mandarin: controlled activity surpasses uncontrolled experience. Cognitive Linguistics 35:3 ► pp. 345 ff.
Zhang, Yixuan, Yimeng Wang, Nutchanon Yongsatianchot, Joseph D Gaggiano, Nurul M Suhaimi, Anne Okrah, Miso Kim, Jacqueline Griffin & Andrea G Parker
2024. Proceedings of the CHI Conference on Human Factors in Computing Systems
, ► pp. 1 ff.
Liu, Meili
2023. Towards a dynamic behavioral profile of the Mandarin Chinese temperature termre: a diachronic semasiological approach. Corpus Linguistics and Linguistic Theory 19:2 ► pp. 289 ff.
Milin, Petar, Benjamin V. Tucker & Dagmar Divjak
2023. A learning perspective on the emergence of abstractions: the curious case of phone(me)s. Language and Cognition 15:4 ► pp. 740 ff.
2023. <i>The Many Uses of Explain:</i>. Annals of the Japan Association for Philosophy of Science 32:0 ► pp. 23 ff.
Van den Heede, Margot & Peter Lauwers
2023. Syntactic productivity under the microscope: the lexical and semantic openness of Dutch minimizing constructions. Folia Linguistica 57:3 ► pp. 723 ff.
Wu, Shuqiong & Yue Ou
2023.
A quantitative study of the polysemy of Mandarin Chinese perception verb
kàn
‘look/see’
. Australian Journal of Linguistics 43:3 ► pp. 191 ff.
Zhou, Jiangping
2023. A corpus-based study of explicit objective modal expressions in English. Studia Neophilologica 95:1 ► pp. 100 ff.
2019. Register, Source Language, and Cognateness Effects on Lexical Choice in Translated Dutch. Meta 63:3 ► pp. 627 ff.
Brown, David West
2018. English and Empire,
Kifokeris, Dimosthenis & Yiannis Xenidis
2018. Application of Linguistic Clustering to Define Sources of Risks in Technical Projects. ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering 4:1
2019. From Athenian fleet to prophetic eschatology. Correlating formal features to themes of discourse in Ancient Greek. Folia Linguistica 53:s40-s2 ► pp. 355 ff.
Vandevoorde, Lore, Els Lefever, Koen Plevoets & Gert De Sutter
This list is based on CrossRef data as of 17 november 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.