Chapter 18 | Exercise 1
‘Multidimensional analysis of register variation in COCA’
The data set
reg_coca represents 42 subcorpora from COCA that belong to five registers. The variables are the same as in the BNC data (see Chapter 18). Find out whether the dimensions are similar to the ones that were found in the BNC.
Perform a PCA and choose the optimal number of dimensions.
Try to give a first linguistic interpretation to the components.
Plot the prototypes of the registers and their confidence ellipses on a biplot(s).
Perform a factor analysis and compare the factors with the principal components. Is there any difference between COCA and the BNC?
Compare two rotations, Varimax and Promax. Do you find any difference?
Perform PCA, representing the column with register names as a supplementary qualitative variable.
> library(Rling) > data(reg_coca) > library(FactoMineR) > reg.pca <- PCA(reg_coca, quali.sup = 1, graph = FALSE) > head(reg.pca$eig) eigenvalue percentage of variance cumulative percentage of variance comp 1 5.2948982 48.135438 48.13544 comp 2 2.5479184 23.162894 71.29833 comp 3 1.4722366 13.383969 84.68230 comp 4 0.6412271 5.829337 90.51164 comp 5 0.5795990 5.269082 95.78072 comp 6 0.1837325 1.670296 97.45102 > barplot(reg.pca1$eig[,2], names = 1:nrow(reg.pca1$eig), xlab = 'components', ylab = 'Percentage of explained variance')
The optimal number of dimensions is three, according to the scree plot. They explain cumulatively almost 85% of variance.
You can use the
dimdesc() function to interpret the dimensions:
> dimdesc(reg.pca) [output omitted]
The first dimension relates to involved vs. informational communication. Dimension 2 contrasts what might be called verbose discourse with concise presentation of information. Dimension 3 relates to the distinction between past and present verbal forms.
> plotellipses(reg.pca, label = 'quali') > plotellipses(reg.pca, label = 'quali', axis = c(2, 3))
The loadings are similar to the results of the PCA and of the case study of the BNC registers.
> reg.fa <- factanal(reg_coca[, -1], factors = 3) > reg.fa$loadings Loadings: Factor1 Factor2 Factor3 Ncomm -0.930 0.165 Nprop -0.907 Vpres 0.460 -0.341 0.817 Vpast 0.329 -0.788 P1 0.981 P2 0.946 Adj -0.902 0.332 0.156 ConjCoord -0.176 0.851 ConjSub 0.411 0.514 Interject 0.871 Num -0.585 -0.379 Factor1 Factor2 Factor3 SS loadings 5.156 2.209 1.375 Proportion Var 0.469 0.201 0.125 Cumulative Var 0.469 0.670 0.794
The results of the FA with the Promax rotation are very similar to the one based on the Verimax rotation.
> reg.fa1 <- factanal(reg_coca[, -1], factors = 3, rotation = 'promax') > reg.fa1$loadings [output omitted]