Choose the number of principal components via reconstruction error {choosepc} | R Documentation |
Choose the number of principal components via reconstruction error
Description
Choose the number of principal components via reconstruction error.
Usage
pc.choose(x, graph = TRUE)
Arguments
x |
A numerical matrix with more rows than columns. |
graph |
Should the plot of the PRESS values appear? Default value is TRUE. |
Details
SVD stands for Singular Value Decomposition of a rectangular matrix. That is any matrix, not only a square one in contrast to the Spectral Decomposition with eigenvalues and eigenvectors, produced by principal component analysis (PCA). Suppose we have a matrix
. Then using SVD we can write the matrix as
where is an orthonormal matrix containing the eigenvectors of
, the
is an orthonormal matrix containing the eigenvectors of
and
is a
diagonal matrix containing the
non zero singular values
(square root of the eigenvalues) of
(or
) and the remaining
elements of the diagonal are zero. We remind that the maximum rank of an
matrix is equal to
. Using the SVD decomposition equaiton above, each column of
can be written as
This means that we can reconstruct the matrix using less columns (if
) than it has.
where .
The reconstructed matrix will have some discrepancy of course, but it is the level of discrepancy we are interested in. If we center the matrix , subtract the column means from every column, and perform the SVD again, we will see that the orthonormal matrix
contains the eigenvectors of the covariance matrix of the original, the un-centred, matrix
.
Coming back to the a matrix of observations and
variables, the question was how many principal components to retain. We will give an answer to this using SVD to reconstruct the matrix. We describe the steps of this algorithm below.
1. Center the matrix by subtracting from each variable its mean
2. Perform SVD on the centred matrix .
3. Choose a number from to
(the rank of the matrix) and reconstruct the matrix. Let us denote by
the reconstructed matrix.
4. Calculate the sum of squared differences between the reconstructed and the original values
5. Plot for all the values of
and choose graphically the number of principal components.
The graphical way of choosing the number of principal components is not the best and there alternative ways of making a decision (see for example Jolliffe (2002)).
Value
A list including:
values |
The eigenvalues of the covariance matrix. |
cumprop |
The cumulative proportion of the eigenvalues of the covariance matrix. |
per |
The differences in the cumulative proportion of the eigenvalues of the covariance matrix. |
press |
The reconstruction error |
runtime |
The runtime of the algorithm. |
Author(s)
Michail Tsagris.
R implementation and documentation: Michail Tsagris mtsagris@uoc.gr.
References
Jolliffe I.T. (2002). Principal Component Analysis.
See Also
Examples
x <- as.matrix(iris[, 1:4])
a <- pc.choose(x, graph = FALSE)