SPcaGrid {rrcovHD}R Documentation

Sparse Robust Principal Components based on Projection Pursuit (PP): GRID search Algorithm

Description

Computes an approximation of the PP-estimators for sparse and robust PCA using the grid search algorithm in the plane.

Usage

    SPcaGrid(x, ...)
    ## Default S3 method:
SPcaGrid(x, k = 0, kmax = ncol(x), method = c ("mad", "sd", "qn", "Qn"), 
    lambda = 1, scale=FALSE, na.action = na.fail, trace=FALSE, ...)
    ## S3 method for class 'formula'
SPcaGrid(formula, data = NULL, subset, na.action, ...)

Arguments

formula

a formula with no response variable, referring only to numeric variables.

data

an optional data frame (or similar: see model.frame) containing the variables in the formula formula.

subset

an optional vector used to select rows (observations) of the data matrix x.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The default is na.omit.

...

arguments passed to or from other methods.

x

a numeric matrix (or data frame) which provides the data for the principal components analysis.

k

number of principal components to compute. If k is missing, or k = 0, the algorithm itself will determine the number of components by finding such k that l_k/l_1 >= 10.E-3 and \Sigma_{j=1}^k l_j/\Sigma_{j=1}^r l_j >= 0.8. It is preferable to investigate the scree plot in order to choose the number of components and then run again. Default is k=0.

kmax

maximal number of principal components to compute. Default is kmax=10. If k is provided, kmax does not need to be specified, unless k is larger than 10.

method

the scale estimator used to detect the direction with the largest variance. Possible values are "sd", "mad" and "Qn". "mad" is the default value.

lambda

the sparseness constraint's strength(sPCAgrid only). A single value for all components, or a vector of length k with different values for each component can be specified. See opt.TPO for the choice of this argument.

scale

a value indicating whether and how the variables should be scaled. If scale = FALSE (default) or scale = NULL no scaling is performed (a vector of 1s is returned in the scale slot). If scale = TRUE the data are scaled to have unit variance. Alternatively it can be a function like sd or mad or a vector of length equal the number of columns of x. The value is passed to the underlying function and the result returned is stored in the scale slot. Default is scale = FALSE

trace

whether to print intermediate results. Default is trace = FALSE

Details

SPcaGrid, serving as a constructor for objects of class SPcaGrid-class is a generic function with "formula" and "default" methods. For details see sPCAgrid and the relevant references.

Value

An S4 object of class SPcaGrid-class which is a subclass of PcaGrid-class which in turn is a subclass of the virtual class PcaRobust-class.

Author(s)

Valentin Todorov valentin.todorov@chello.at

References

C. Croux, P. Filzmoser, M. Oliveira, (2007). Algorithms for Projection-Pursuit Robust Principal Component Analysis, Chemometrics and Intelligent Laboratory Systems, Vol. 87, pp. 218-225.

C. Croux, P. Filzmoser, H. Fritz (2013). Robust Sparse Principal Component Analysis, Technometrics 55(2), pp. 202–2014, doi:10.1080/00401706.2012.727746.

V. Todorov, P. Filzmoser (2013). Comparing classical and robust sparse PCA. In R Kruse, M Berthold, C Moewes, M Gil, P Grzegorzewski, O Hryniewicz (eds.), Synergies of Soft Computing and Statistics for Intelligent Data Analysis, volume 190 of Advances in Intelligent Systems and Computing, pp. 283–291. Springer, Berlin; New York. ISBN 978-3-642-33041-4, doi:10.1007/978-3-642-33042-1_31.

Examples


data(bus)
bus <- as.matrix(bus)

## calculate MADN for each variable
xmad <- apply(bus, 2, mad)
cat("\nMin, Max of MADN: ", min(xmad), max(xmad), "\n")

## calculate MADN for each variable
xqn <- apply(bus, 2, Qn)
cat("\nMin, Max of Qn: ", min(xqn), max(xqn), "\n")


## MADN vary between 0 (for variable 9) and 34. Therefore exclude
##  variable 9 and divide the remaining variables by their MADNs.
bus1 <- bus[, -c(9)]
p <- ncol(bus1)

madbus <- apply(bus1, 2, mad)
bus2 <- sweep(bus1, 2, madbus, "/", check.margin = FALSE)

xsd <- apply(bus1, 2, sd)
bus.sd <- sweep(bus1, 2, xsd, "/", check.margin = FALSE)

xqn <- apply(bus1, 2, Qn)
bus.qn <- sweep(bus1, 2, xqn, "/", check.margin = FALSE)

## Not run: 
spc <- SPcaGrid(bus2, lambda=0, method="sd", k=p, kmax=p)
rspc <- SPcaGrid(bus2, lambda=0, method="Qn", k=p, kmax=p)
summary(spc)
summary(rspc)
screeplot(spc, type="line", main="Classical PCA", sub="PC", cex.main=2)
screeplot(rspc, type="line", main="Robust PCA", sub="PC", cex.main=2)

##  find lambda

K <- 4
lambda.sd <- 1.64
    to.sd <- .tradeoff(bus2, k=K, lambda.max=2.5, lambda.n=100, method="sd")
    plot(to.sd, type="b", xlab="lambda", ylab="Explained Variance (percent)")
    abline(v=lambda.sd, lty="dotted")
 
spc.sd.p <- SPcaGrid(bus2, lambda=lambda.sd, method="sd", k=p)
.CPEV(spc.sd.p, k=K)
spc.sd <- SPcaGrid(bus2, lambda=lambda.sd, method="sd", k=K)
getLoadings(spc.sd)[,1:K]
plot(spc.sd)

lambda.qn <- 2.06
    to.qn <- .tradeoff(bus2, k=K, lambda.max=2.5, lambda.n=100, method="Qn")
    plot(to.qn, type="b", xlab="lambda", ylab="Explained Variance (percent)")
    abline(v=lambda.qn, lty="dotted")

spc.qn.p <- SPcaGrid(bus2, lambda=lambda.qn, method="Qn", k=p)
.CPEV(spc.qn.p, k=K)
spc.qn <- SPcaGrid(bus2, lambda=lambda.qn, method="Qn", k=K)
getLoadings(spc.qn)[,1:K]
plot(spc.qn)

## End(Not run)

## DD-plots
##
## Not run:
## Not run: 
usr <- par(mfrow=c(2,2))
plot(SPcaGrid(bus2, lambda=0, method="sd", k=4), id.n.sd=0, main="Standard PCA")
plot(SPcaGrid(bus2, lambda=0, method="Qn", k=4), id.n.sd=0, ylim=c(0,20))

plot(SPcaGrid(bus2, lambda=1.64, method="sd", k=4), id.n.sd=0, main="Stdandard sparse PCA")
plot(SPcaGrid(bus2, lambda=3.07, method="Qn", k=4), id.n.sd=0, main="Robust sparse PCA")

par(usr)
## End (Not run)

## End(Not run)

[Package rrcovHD version 0.3-0 Index]