CauRuimet {PTAk} | R Documentation |
Robust estimation of within group varinace-covariance
Description
Gives a robust estimate of an unknown within group covariance, aiming either to look for dense groups or to sparse groups (outliers) according to local variance and weighting function choice.
Usage
CauRuimet(Z,ker=1,m0=1,withingroup=TRUE,
loc=substitute(apply(Z,2,mean,trim=.1)),matrixmethod=TRUE, Nrandom=3000)
Arguments
Z |
matrix |
ker |
either numerical or a function:
if numerical the weighting function is |
m0 |
is a graph of neighbourhood or another proximity matrix, the hadamard product of the proximities will be operated |
withingroup |
logical,if |
loc |
a vector of locations or a function using mean, median, to give an estimate of it |
matrixmethod |
if |
Nrandom |
if |
Details
When withingroup is TRUE
, local(defined by the weighting) variance formula is returned, aiming
at finding dense groups:
W_l=\frac{\sum_{i=1}^{n-1}\sum_{j=i+1}^n
m0_{ij}ker(d^2_{S^-}(Z_i,Z_j))(Z_i-Z_j)'(Z_i-Z_j)}{\sum_{i=1}^{n-1}\sum_{j=i+1}^n
m0_{ij}ker(d^2_{S^-}(Z_i,Z_j))}
where d^2_{S^-}( . , .)
is the squared euclidian distance with
S^-
the inverse of a robust sample covariance (i.e. using loc
instead of the mean) ;
if FALSE
robust Total weighted variance or if m0
not 1 Global weighted variance, is returned:
W_o=\frac{\sum_{i=1}^nker(d^2_{S^-}(Z_i,\tilde{Z}))(Z_i-\tilde{Z})'(Z_i-\tilde{Z})}
{\sum_{i=1}^n ker(d^2_{S^-}(Z_i,\tilde{Z}))}
W_g=\frac{\sum_{i=1}^{n-1}\sum_{j=i+1}^n
m0_{ij}.ker(d^2_{S^-}(Z_i,Z_j))(Z_i-\tilde{Z})'(Z_j-\tilde{Z})}
{\sum_{i=1}^{n-1}\sum_{j=i+1}^n
m0_{ij}ker(d^2_{S^-}(Z_i,Z_j))}
where \tilde{Z}
is the vector loc
.
If m0
is a graph of neighbourhood and ker is the function returning 1 (no proximity due to
distance is used) the function will return (when withingroup=TRUE
) the local
variance-covariance matrix as define in Lebart(1969).
Value
a matrix
Note
As mentioned by Caussinus and Ruiz a good strategy to reveal dense groups with generalised PCA
would be to reveal outliers first using the metric W_o^{-1}
and remove them before using the
metric W_l^{-1}
. Based on theoretical considerations they recommand for the choice of
ker
, with the decreasing function e^{(-ker \;t)}
: a lower bound of 1 if
withingroup
and something fairly small say in the interval [0.05;0.3] otherwise.
Author(s)
Didier G. Leibovici
References
Caussinus, H and Ruiz, A (1990) Interesting Projections of Multidimensional Data by Means of Generalized Principal Components Analysis. COMPSTAT90, Physica-Verlag, Heidelberg,121-126.
Faraj, A (1994) Interpretation tools for Generalized Discriminant Analysis.In: New Approches in Classification and Data Analysis, Springer-Verlag, 286-291, Heidelberg.
Lebart, L (1969) Analyse statistique de la contiguit<e9>e.Publication de l'Institut de Statistiques Universitaire de Paris, XVIII,81-112.
Leibovici D (2008) Spatio-temporal Multiway Decomposition using Principal Tensor Analysis on k-modes: the R package PTAk . to be submitted soon at Journal of Statisticcal Software.
See Also
Examples
data(iris)
iris2 <- as.matrix(iris[,1:4])
dimnames(iris2)[[1]] <- as.character(iris[,5])
D2 <- CauRuimet(iris2,ker=1,withingroup=TRUE)
D2 <- Powmat(D2,(-1))
iris2 <- sweep(iris2,2,apply(iris2,2,mean))
res <- SVDgen(iris2,D2=D2,D1=1)
plot(res,nb1=1,nb2=2,cex=1,mod=1,Zcol=list(c(rep(1,50),rep(2,50),rep(3,50))))
summary(res,testvar=0)
# the same in a demo function
# source(paste(R.home(),"/library/PTAk/demo/CauRuimet.R",sep=""))
# demo.CauRuimet(ker=4,withingroup=TRUE,openX11s=FALSE)
# demo.Cauruimet(ker=0.15,withingroup=FALSE,openX11s=FALSE)