CauRuimet {PTAk}R Documentation

Robust estimation of within group varinace-covariance

Description

Gives a robust estimate of an unknown within group covariance, aiming either to look for dense groups or to sparse groups (outliers) according to local variance and weighting function choice.

Usage

 CauRuimet(Z,ker=1,m0=1,withingroup=TRUE,
              loc=substitute(apply(Z,2,mean,trim=.1)),matrixmethod=TRUE, Nrandom=3000)

        

Arguments

Z

matrix

ker

either numerical or a function: if numerical the weighting function is e^{(-ker \;t)}, otherwise
ker=function(t){return(expression)} is a positive decreasing function.

m0

is a graph of neighbourhood or another proximity matrix, the hadamard product of the proximities will be operated

withingroup

logical,if TRUE the aim is to give a robust estimate for dense groups, if FALSE the aim is to give a robust estimate for outliers

loc

a vector of locations or a function using mean, median, to give an estimate of it

matrixmethod

if TRUE (only with withingroup) uses some matrix computation rather than double looping as suggests the formula below

Nrandom

if Nrandom < dim(Z)[1]) uses only a Nrandom sample from rows of Z and m0 if applicable.

Details

When withingroup is TRUE, local(defined by the weighting) variance formula is returned, aiming at finding dense groups:

W_l=\frac{\sum_{i=1}^{n-1}\sum_{j=i+1}^n m0_{ij}ker(d^2_{S^-}(Z_i,Z_j))(Z_i-Z_j)'(Z_i-Z_j)}{\sum_{i=1}^{n-1}\sum_{j=i+1}^n m0_{ij}ker(d^2_{S^-}(Z_i,Z_j))}

where d^2_{S^-}( . , .) is the squared euclidian distance with S^- the inverse of a robust sample covariance (i.e. using loc instead of the mean) ; if FALSE robust Total weighted variance or if m0 not 1 Global weighted variance, is returned:

W_o=\frac{\sum_{i=1}^nker(d^2_{S^-}(Z_i,\tilde{Z}))(Z_i-\tilde{Z})'(Z_i-\tilde{Z})} {\sum_{i=1}^n ker(d^2_{S^-}(Z_i,\tilde{Z}))}

W_g=\frac{\sum_{i=1}^{n-1}\sum_{j=i+1}^n m0_{ij}.ker(d^2_{S^-}(Z_i,Z_j))(Z_i-\tilde{Z})'(Z_j-\tilde{Z})} {\sum_{i=1}^{n-1}\sum_{j=i+1}^n m0_{ij}ker(d^2_{S^-}(Z_i,Z_j))}

where \tilde{Z} is the vector loc.
If m0 is a graph of neighbourhood and ker is the function returning 1 (no proximity due to distance is used) the function will return (when withingroup=TRUE) the local variance-covariance matrix as define in Lebart(1969).

Value

a matrix

Note

As mentioned by Caussinus and Ruiz a good strategy to reveal dense groups with generalised PCA would be to reveal outliers first using the metric W_o^{-1} and remove them before using the metric W_l^{-1}. Based on theoretical considerations they recommand for the choice of ker, with the decreasing function e^{(-ker \;t)}: a lower bound of 1 if withingroup and something fairly small say in the interval [0.05;0.3] otherwise.

Author(s)

Didier G. Leibovici

References

Caussinus, H and Ruiz, A (1990) Interesting Projections of Multidimensional Data by Means of Generalized Principal Components Analysis. COMPSTAT90, Physica-Verlag, Heidelberg,121-126.

Faraj, A (1994) Interpretation tools for Generalized Discriminant Analysis.In: New Approches in Classification and Data Analysis, Springer-Verlag, 286-291, Heidelberg.

Lebart, L (1969) Analyse statistique de la contiguit<e9>e.Publication de l'Institut de Statistiques Universitaire de Paris, XVIII,81-112.

Leibovici D (2008) Spatio-temporal Multiway Decomposition using Principal Tensor Analysis on k-modes: the R package PTAk . to be submitted soon at Journal of Statisticcal Software.

See Also

SVDgen

Examples


 data(iris)
  iris2 <- as.matrix(iris[,1:4])
  dimnames(iris2)[[1]] <- as.character(iris[,5])

 D2 <- CauRuimet(iris2,ker=1,withingroup=TRUE)
 D2 <- Powmat(D2,(-1))
 iris2 <- sweep(iris2,2,apply(iris2,2,mean))
 res <- SVDgen(iris2,D2=D2,D1=1)
 plot(res,nb1=1,nb2=2,cex=1,mod=1,Zcol=list(c(rep(1,50),rep(2,50),rep(3,50))))
 summary(res,testvar=0)

 # the same in a demo function
 # source(paste(R.home(),"/library/PTAk/demo/CauRuimet.R",sep=""))
 # demo.CauRuimet(ker=4,withingroup=TRUE,openX11s=FALSE)
 # demo.Cauruimet(ker=0.15,withingroup=FALSE,openX11s=FALSE)

[Package PTAk version 2.0.0 Index]