GSE {GSE} | R Documentation |
Generalized S-Estimator in the presence of missing data
Description
Computes the Generalized S-Estimate (GSE) – a robust estimate of location and scatter for data with contamination and missingness.
Usage
GSE(x, tol=1e-4, maxiter=150, method=c("bisquare","rocke"),
init=c("emve","qc","huber","imputed","emve_c"), mu0, S0, ...)
Arguments
x |
a matrix or data frame. May contain missing values, but cannot contain columns with completely missing entries. |
tol |
tolerance for the convergence criterion. Default is 1e-4. |
maxiter |
maximum number of iterations for the GSE algorithm. Default is 150. |
method |
which loss function to use: 'bisquare', 'rocke'. |
init |
type of initial estimator. Currently this can either be "emve" (EMVE with uniform sampling, see Danilov et al., 2012),
"qc" (QC, see Danilov et al., 2012), "huber" (Huber Pairwise, see Danilov et al., 2012),
"imputed" (Imputed S-estimator, see the rejoinder in Agostinelli et al., 2015), or
"emve_c" (EMVE_C with cluster sampling, see Leung and Zamar, 2016).
Default is "emve". If |
mu0 |
optional vector of initial location estimate |
S0 |
optional matrix of initial scatter estimate |
... |
optional arguments for computing the initial estimates (see |
Details
This function computes GSE (Danilov et al., 2012) and GRE (Leung and Zamar, 2016). The estimator requires a robust positive definite
initial estimator. This initial estimator is required to “re-scale" the partial square mahalanobis distance for
the different missing pattern, in which a single scale parameter is not enough. This function currently allows two
main initial estimators: EMVE (the default; see emve
and Huberized Pairwise
(see HuberPairwise
). GSE using Huberized Pairwise with sign psi function is referred to as QGSE in Danilov et al. (2012).
Numerical results have shown that GSE with EMVE as
initial has better performance (in both efficiency and robustness), but computing time can be longer.
Value
An S4 object of class GSE-class
which is a subclass of the virtual class CovRobMissSc-class
. The
output S4 object contains the following slots:
mu | Estimated location. Can be accessed via getLocation . |
S | Estimated scatter matrix. Can be accessed via getScatter . |
sc | Generalized S-scale (GS-scale). Can be accessed via getScale . |
pmd | Squared partial Mahalanobis distances. Can be accessed via getDist . |
pmd.adj | Adjusted squared partial Mahalanobis distances. Can be accessed via getDistAdj . |
pu | Dimension of the observed entries for each case. Can be accessed via getDim . |
mu0 | Estimated initial location. |
S0 | Estimated initial scatter matrix. |
ximp | Input data matrix with missing values imputed using best linear predictor. Not meant to be accessed. |
weights | Weights used in the estimation of the location. Not meant to be accessed. |
weightsp | First derivative of the weights used in the estimation of the location. Not meant to be accessed. |
iter | Number of iterations till convergence. Not meant to be accessed. |
eps | relative change of the GS-scale at convergence. Not meant to be accessed. |
call | Object of class "language" . Not meant to be accessed. |
x | Input data matrix. Not meant to be accessed. |
p | Column dimension of input data matrix. Not meant to be accessed. |
estimator | Character string of the name of the estimator used. Not meant to be accessed. |
Author(s)
Andy Leung andy.leung@stat.ubc.ca, Ruben H. Zamar, Mike Danilov, Victor J. Yohai
References
Agostinelli, C., Leung, A. , Yohai, V.J., and Zamar, R.H. (2015) Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination. TEST.
Danilov, M., Yohai, V.J., Zamar, R.H. (2012). Robust Esimation of Multivariate Location and Scatter in the Presence of Missing Data. Journal of the American Statistical Association 107, 1178–1186.
Leung, A. and Zamar, R.H. (2016). Multivariate Location and Scatter Matrix Estimation Under Cellwise and Casewise Contamination. Submitted.
See Also
emve
, HuberPairwise
, GSE-class
, generate.casecontam
Examples
set.seed(12)
## generate 10-dimensional data with 10% casewise contamination
n <- 100
p <- 10
A <- matrix(0.9, p, p)
diag(A) <- 1
x <- generate.casecontam(n, p, cond=100, contam.size=10, contam.prop=0.1, A=A)$x
## introduce 5% missingness
pmiss <- 0.05
nmiss <- matrix(rbinom(n*p,1,pmiss), n,p)
x[ which( nmiss == 1 ) ] <- NA
## Using EMVE as initial
res.emve <- GSE(x)
slrt( getScatter(res.emve), A) ## LRT distances to the true covariance
## Using QC as initial
res.qc <- GSE(x, init="qc")
slrt( getScatter(res.qc), A) ## in general performs worse than if EMVE used as initials