Chapter 18 | Exercise 1
‘Multidimensional analysis of register variation in COCA’
The data set reg_coca
represents 42 subcorpora from COCA that belong to five registers. The variables are the same as in the BNC data (see Chapter 18). Find out whether the dimensions are similar to the ones that were found in the BNC.
Perform a PCA and choose the optimal number of dimensions.
Try to give a first linguistic interpretation to the components.
Plot the prototypes of the registers and their confidence ellipses on a biplot(s).
Perform a factor analysis and compare the factors with the principal components. Is there any difference between COCA and the BNC?
Compare two rotations, Varimax and Promax. Do you find any difference?
Perform PCA, representing the column with register names as a supplementary qualitative variable.
> library(Rling)
> data(reg_coca)
> library(FactoMineR)
> reg.pca <- PCA(reg_coca, quali.sup = 1, graph = FALSE)
> head(reg.pca$eig)
eigenvalue percentage of variance cumulative percentage of variance
comp 1 5.2948982 48.135438 48.13544
comp 2 2.5479184 23.162894 71.29833
comp 3 1.4722366 13.383969 84.68230
comp 4 0.6412271 5.829337 90.51164
comp 5 0.5795990 5.269082 95.78072
comp 6 0.1837325 1.670296 97.45102
> barplot(reg.pca1$eig[,2], names = 1:nrow(reg.pca1$eig), xlab = 'components', ylab = 'Percentage of explained variance')
The optimal number of dimensions is three, according to the scree plot. They explain cumulatively almost 85% of variance.
You can use the dimdesc()
function to interpret the dimensions:
> dimdesc(reg.pca)
[output omitted]
The first dimension relates to involved vs. informational communication. Dimension 2 contrasts what might be called verbose discourse with concise presentation of information. Dimension 3 relates to the distinction between past and present verbal forms.
> plotellipses(reg.pca, label = 'quali')
> plotellipses(reg.pca, label = 'quali', axis = c(2, 3))
The loadings are similar to the results of the PCA and of the case study of the BNC registers.
> reg.fa <- factanal(reg_coca[, -1], factors = 3)
> reg.fa$loadings
Loadings:
Factor1 Factor2 Factor3
Ncomm -0.930 0.165
Nprop -0.907
Vpres 0.460 -0.341 0.817
Vpast 0.329 -0.788
P1 0.981
P2 0.946
Adj -0.902 0.332 0.156
ConjCoord -0.176 0.851
ConjSub 0.411 0.514
Interject 0.871
Num -0.585 -0.379
Factor1 Factor2 Factor3
SS loadings 5.156 2.209 1.375
Proportion Var 0.469 0.201 0.125
Cumulative Var 0.469 0.670 0.794
The results of the FA with the Promax rotation are very similar to the one based on the Verimax rotation.
> reg.fa1 <- factanal(reg_coca[, -1], factors = 3, rotation = 'promax')
> reg.fa1$loadings
[output omitted]