prcomp {stats} | R Documentation |
Principal Components Analysis
Description
Performs a principal components analysis on the given data matrix
and returns the results as an object of class prcomp
.
Usage
prcomp(x, ...)
## S3 method for class 'formula'
prcomp(formula, data = NULL, subset, na.action, ...)
## Default S3 method:
prcomp(x, retx = TRUE, center = TRUE, scale. = FALSE,
tol = NULL, rank. = NULL, ...)
## S3 method for class 'prcomp'
predict(object, newdata, ...)
Arguments
formula |
a formula with no response variable, referring only to numeric variables. |
data |
an optional data frame (or similar: see
|
subset |
an optional vector used to select rows (observations) of the
data matrix |
na.action |
a function which indicates what should happen
when the data contain |
... |
arguments passed to or from other methods. If |
x |
a numeric or complex matrix (or data frame) which provides the data for the principal components analysis. |
retx |
a logical value indicating whether the rotated variables should be returned. |
center |
a logical value indicating whether the variables
should be shifted to be zero centered. Alternately, a vector of
length equal the number of columns of |
scale. |
a logical value indicating whether the variables should
be scaled to have unit variance before the analysis takes
place. The default is |
tol |
a value indicating the magnitude below which components
should be omitted. (Components are omitted if their
standard deviations are less than or equal to |
rank. |
optionally, a number specifying the maximal rank, i.e.,
maximal number of principal components to be used. Can be set as
alternative or in addition to |
object |
object of class inheriting from |
newdata |
An optional data frame or matrix in which to look for
variables with which to predict. If omitted, the scores are used.
If the original fit used a formula or a data frame or a matrix with
column names, |
Details
The calculation is done by a singular value decomposition of the
(centered and possibly scaled) data matrix, not by using
eigen
on the covariance matrix. This
is generally the preferred method for numerical accuracy. The
print
method for these objects prints the results in a nice
format and the plot
method produces a scree plot.
Unlike princomp
, variances are computed with the usual
divisor N - 1
.
Note that scale = TRUE
cannot be used if there are zero or
constant (for center = TRUE
) variables.
Value
prcomp
returns a list with class "prcomp"
containing the following components:
sdev |
the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix). |
rotation |
the matrix of variable loadings (i.e., a matrix
whose columns contain the eigenvectors). The function
|
x |
if |
center , scale |
the centering and scaling used, or |
Note
The signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA, and even between different builds of R.
References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Mardia, K. V., J. T. Kent, and J. M. Bibby (1979) Multivariate Analysis, London: Academic Press.
Venables, W. N. and B. D. Ripley (2002) Modern Applied Statistics with S, Springer-Verlag.
See Also
biplot.prcomp
, screeplot
,
princomp
, cor
, cov
,
svd
, eigen
.
Examples
C <- chol(S <- toeplitz(.9 ^ (0:31))) # Cov.matrix and its root
all.equal(S, crossprod(C))
set.seed(17)
X <- matrix(rnorm(32000), 1000, 32)
Z <- X %*% C ## ==> cov(Z) ~= C'C = S
all.equal(cov(Z), S, tolerance = 0.08)
pZ <- prcomp(Z, tol = 0.1)
summary(pZ) # only ~14 PCs (out of 32)
## or choose only 3 PCs more directly:
pz3 <- prcomp(Z, rank. = 3)
summary(pz3) # same numbers as the first 3 above
stopifnot(ncol(pZ$rotation) == 14, ncol(pz3$rotation) == 3,
all.equal(pz3$sdev, pZ$sdev, tolerance = 1e-15)) # exactly equal typically
## signs are random
require(graphics)
## the variances of the variables in the
## USArrests data vary by orders of magnitude, so scaling is appropriate
prcomp(USArrests) # inappropriate
prcomp(USArrests, scale. = TRUE)
prcomp(~ Murder + Assault + Rape, data = USArrests, scale. = TRUE)
plot(prcomp(USArrests))
summary(prcomp(USArrests, scale. = TRUE))
biplot(prcomp(USArrests, scale. = TRUE))