PcaNA {rrcovNA} | R Documentation |
Classical or robust Principal Components for incomplete data
Description
Computes classical and robust principal components for incomplete data using an EM algorithm as descibed by Serneels and Verdonck (2008)
Usage
PcaNA(x, ...)
## Default S3 method:
PcaNA(x, k = ncol(x), kmax = ncol(x), conv=1e-10, maxiter=100,
method=c("cov", "locantore", "hubert", "grid", "proj", "class"), cov.control=NULL,
scale = FALSE, signflip = TRUE, crit.pca.distances = 0.975, trace=FALSE, ...)
## S3 method for class 'formula'
PcaNA(formula, data = NULL, subset, na.action, ...)
Arguments
formula |
a formula with no response variable, referring only to numeric variables. |
data |
an optional data frame (or similar: see
|
subset |
an optional vector used to select rows (observations) of the
data matrix |
na.action |
a function which indicates what should happen
when the data contain |
... |
arguments passed to or from other methods. |
x |
a numeric matrix (or data frame) which provides the data for the principal components analysis. |
k |
number of principal components to compute. If |
kmax |
maximal number of principal components to compute.
Default is |
conv |
convergence criterion for the EM algorithm.
Default is |
maxiter |
maximal number of iterations for the EM algorithm.
Default is |
method |
which PC method to use (classical or robust) - "class" means classical PCA
and one of the following "locantore", "hubert", "grid", "proj", "cov" specifies a
robust PCA method. If the method is "cov" - i.e. PCA based on a robust covariance matrix -
the argument |
cov.control |
control object in case of robust PCA based on a robust covariance matrix. |
scale |
a logical value indicating whether the variables should be
scaled to have unit variance (only possible if there are no constant
variables). As a scale function |
signflip |
a logical value indicating wheather to try to solve the sign indeterminancy of the loadings -
ad hoc approach setting the maximum element in a singular vector to be positive. Default is |
crit.pca.distances |
criterion to use for computing the cutoff values for the orthogonal and score distances. Default is 0.975. |
trace |
whether to print intermediate results. Default is |
Details
PcaNA
, serving as a constructor for objects of class PcaNA-class
is a generic function with "formula" and "default" methods. For details see the relevant references.
Value
An S4 object of class PcaNA-class
which is a subclass of the
virtual class Pca-class
.
Author(s)
Valentin Todorov valentin.todorov@chello.at
References
Serneels S & Verdonck T (2008), Principal component analysis for data containing outliers and missing elements. Computational Statistics and Data Analisys, 52(3), 1712–1727 .
Todorov V & Filzmoser P (2009), An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software, 32(3), 1–47. <doi:10.18637/jss.v032.i03>.
Examples
## 1. With complete data
## PCA of the bushfire data
data(bushfire)
pca <- PcaNA(bushfire)
pca
## Compare with the classical PCA
prcomp(bushfire)
## or
PcaNA(bushfire, method="class")
## If you want to print the scores too, use
print(pca, print.x=TRUE)
## Using the formula interface
PcaNA(~., data=bushfire)
## To plot the results:
plot(pca) # distance plot
pca2 <- PcaNA(bushfire, k=2)
plot(pca2) # PCA diagnostic plot (or outlier map)
## Use the standard plots available for for prcomp and princomp
screeplot(pca)
biplot(pca)
################################################################
## 2. Now the same wit incomplete data - bush10
data(bush10)
pca <- PcaNA(bush10)
pca
## Compare with the classical PCA
PcaNA(bush10, method="class")
## If you want to print the scores too, use
print(pca, print.x=TRUE)
## Using the formula interface
PcaNA(~., data=as.data.frame(bush10))
## To plot the results:
plot(pca) # distance plot
pca2 <- PcaNA(bush10, k=2)
plot(pca2) # PCA diagnostic plot (or outlier map)
## Use the standard plots available for for prcomp and princomp
screeplot(pca)
biplot(pca)