robpca {rospca} | R Documentation |
ROBust PCA algorithm
Description
ROBPCA algorithm of Hubert et al. (2005) including reweighting (Engelen et al., 2005) and possible extension to skewed data (Hubert et al., 2009).
Usage
robpca (x, k = 0, kmax = 10, alpha = 0.75, h = NULL, mcd = FALSE,
ndir = "all", skew = FALSE, ...)
Arguments
x |
An |
k |
Number of principal components that will be used. When |
kmax |
Maximal number of principal components that will be computed, default is 10. |
alpha |
Robustness parameter, default is 0.75. |
h |
The number of outliers the algorithm should resist is given by |
mcd |
Logical indicating if the MCD adaptation of ROBPCA may be applied when the number of variables is sufficiently small (see Details). If |
ndir |
Number of directions used when computing the outlyingness (or the adjusted outlyingness when |
skew |
Logical indicating if the version for skewed data (Hubert et al., 2009) is applied, default is |
... |
Other arguments to pass to methods. |
Details
This function is based extensively on PcaHubert
from rrcov and there are two main differences:
The outlyingness measure that is used for non-skewed data (skew=FALSE
) is the Stahel-Donoho measure as described in Hubert et al. (2005) which is also used in PcaHubert
. The implementation in mrfDepth (which is used here) is however much faster than the one in PcaHubert
and hence more, or even all, directions can be considered when computing the outlyingness measure.
Moreover, the extension for skewed data of Hubert et al. (2009) (skew=TRUE
) is also implemented here, but this is not included in PcaHubert
.
For an extensive description of the ROBPCA algorithm we refer to Hubert et al. (2005) and to PcaHubert
.
When mcd=TRUE
and n<5 \times p
, we do not apply the full ROBPCA algorithm. The loadings and eigenvalues
are then computed as the eigenvectors and eigenvalues of the MCD estimator applied to the data set after the SVD step.
Value
A list with components:
loadings |
Loadings matrix containing the robust loadings (eigenvectors), a numeric matrix of size |
eigenvalues |
Numeric vector of length |
scores |
Scores matrix (computed as |
center |
Numeric vector of length |
k |
Number of (chosen) principal components. |
H0 |
Logical vector of size |
H1 |
Logical vector of size |
alpha |
The robustness parameter |
h |
The |
sd |
Numeric vector of size |
od |
Numeric vector of size |
cutoff.sd |
Cut-off value for the robust score distances. |
cutoff.od |
Cut-off value for the orthogonal distances. |
flag.sd |
Numeric vector of size |
flag.od |
Numeric vector of size |
flag.all |
Numeric vector of size |
Author(s)
Tom Reynkens, based on R code from Valentin Todorov for PcaHubert
in rrcov (released under GPL-3) and Matlab code from Katrien Van Driessen (for the univariate MCD).
References
Hubert, M., Rousseeuw, P. J., and Vanden Branden, K. (2005), “ROBPCA: A New Approach to Robust Principal Component Analysis,” Technometrics, 47, 64–79.
Engelen, S., Hubert, M. and Vanden Branden, K. (2005), “A Comparison of Three Procedures for Robust PCA in High Dimensions", Austrian Journal of Statistics, 34, 117–126.
Hubert, M., Rousseeuw, P. J., and Verdonck, T. (2009), “Robust PCA for Skewed Data and Its Outlier Map," Computational Statistics & Data Analysis, 53, 2264–2274.
See Also
PcaHubert
, outlyingness
, adjOutl
Examples
X <- dataGen(m=1, n=100, p=10, eps=0.2, bLength=4)$data[[1]]
resR <- robpca(X, k=2)
diagPlot(resR)