GSE {GSE}R Documentation

Generalized S-Estimator in the presence of missing data

Description

Computes the Generalized S-Estimate (GSE) – a robust estimate of location and scatter for data with contamination and missingness.

Usage

GSE(x, tol=1e-4, maxiter=150, method=c("bisquare","rocke"), 
    init=c("emve","qc","huber","imputed","emve_c"), mu0, S0, ...)

Arguments

x

a matrix or data frame. May contain missing values, but cannot contain columns with completely missing entries.

tol

tolerance for the convergence criterion. Default is 1e-4.

maxiter

maximum number of iterations for the GSE algorithm. Default is 150.

method

which loss function to use: 'bisquare', 'rocke'.

init

type of initial estimator. Currently this can either be "emve" (EMVE with uniform sampling, see Danilov et al., 2012), "qc" (QC, see Danilov et al., 2012), "huber" (Huber Pairwise, see Danilov et al., 2012), "imputed" (Imputed S-estimator, see the rejoinder in Agostinelli et al., 2015), or "emve_c" (EMVE_C with cluster sampling, see Leung and Zamar, 2016). Default is "emve". If mu0 and S0 are provided, this argument is ignored.

mu0

optional vector of initial location estimate

S0

optional matrix of initial scatter estimate

...

optional arguments for computing the initial estimates (see emve, HuberPairwise).

Details

This function computes GSE (Danilov et al., 2012) and GRE (Leung and Zamar, 2016). The estimator requires a robust positive definite initial estimator. This initial estimator is required to “re-scale" the partial square mahalanobis distance for the different missing pattern, in which a single scale parameter is not enough. This function currently allows two main initial estimators: EMVE (the default; see emve and Huberized Pairwise (see HuberPairwise). GSE using Huberized Pairwise with sign psi function is referred to as QGSE in Danilov et al. (2012). Numerical results have shown that GSE with EMVE as initial has better performance (in both efficiency and robustness), but computing time can be longer.

Value

An S4 object of class GSE-class which is a subclass of the virtual class CovRobMissSc-class. The output S4 object contains the following slots:

mu Estimated location. Can be accessed via getLocation.
S Estimated scatter matrix. Can be accessed via getScatter.
sc Generalized S-scale (GS-scale). Can be accessed via getScale.
pmd Squared partial Mahalanobis distances. Can be accessed via getDist.
pmd.adj Adjusted squared partial Mahalanobis distances. Can be accessed via getDistAdj.
pu Dimension of the observed entries for each case. Can be accessed via getDim.
mu0 Estimated initial location.
S0 Estimated initial scatter matrix.
ximp Input data matrix with missing values imputed using best linear predictor. Not meant to be accessed.
weights Weights used in the estimation of the location. Not meant to be accessed.
weightsp First derivative of the weights used in the estimation of the location. Not meant to be accessed.
iter Number of iterations till convergence. Not meant to be accessed.
eps relative change of the GS-scale at convergence. Not meant to be accessed.
call Object of class "language". Not meant to be accessed.
x Input data matrix. Not meant to be accessed.
p Column dimension of input data matrix. Not meant to be accessed.
estimator Character string of the name of the estimator used. Not meant to be accessed.

Author(s)

Andy Leung andy.leung@stat.ubc.ca, Ruben H. Zamar, Mike Danilov, Victor J. Yohai

References

Agostinelli, C., Leung, A. , Yohai, V.J., and Zamar, R.H. (2015) Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination. TEST.

Danilov, M., Yohai, V.J., Zamar, R.H. (2012). Robust Esimation of Multivariate Location and Scatter in the Presence of Missing Data. Journal of the American Statistical Association 107, 1178–1186.

Leung, A. and Zamar, R.H. (2016). Multivariate Location and Scatter Matrix Estimation Under Cellwise and Casewise Contamination. Submitted.

See Also

emve, HuberPairwise, GSE-class, generate.casecontam

Examples

set.seed(12)

## generate 10-dimensional data with 10% casewise contamination
n <- 100
p <- 10
A <- matrix(0.9, p, p)
diag(A) <- 1
x <- generate.casecontam(n, p, cond=100, contam.size=10, contam.prop=0.1, A=A)$x

## introduce 5% missingness
pmiss <- 0.05
nmiss <- matrix(rbinom(n*p,1,pmiss), n,p)
x[ which( nmiss == 1 ) ] <- NA

## Using EMVE as initial
res.emve <- GSE(x)
slrt( getScatter(res.emve), A) ## LRT distances to the true covariance

## Using QC as initial
res.qc <- GSE(x, init="qc")
slrt( getScatter(res.qc), A) ## in general performs worse than if EMVE used as initials


[Package GSE version 4.2-1 Index]