cellMCD {cellWise}R Documentation

cellWise minimum covariance determinant estimator

Description

The cellwise minimum covariance determinant estimator computes cellwise robust estimates of the center and covariance matrix of a data set X. The algorithm guarantees a monotone decrease of an objective function, which is based on observed Gaussian log-likelihood. By default, it starts by calling checkDataSet to clean the data.

Usage

cellMCD(X, alpha = 0.75, quant = 0.99,
        crit = 1e-4, noCits = 100, lmin = 1e-4,
        checkPars = list())

Arguments

X

X is the input data, and must be an nn by dd matrix or a data frame.

alpha

In each column, at least nn*alpha cells must remain unflagged. Defaults to 7575%, should not be set (much) lower.

quant

Determines the cutoff value to flag cells. Defaults to 0.990.99.

crit

The iteration stops when successive covariance matrices (of the standardized data) differ by less than crit. Defaults to 1e41e-4.

noCits

The maximal number of C-steps used.

lmin

a lower bound on the eigenvalues of the estimated covariance matrix on the standardized data. Defaults to 1e41e-4. Should not be smaller than 1e61e-6.

checkPars

Optional list of parameters used in the call to checkDataSet. The options are:

  • coreOnly
    If TRUE, skip the execution of checkDataset. Defaults to FALSE.

  • numDiscrete
    A column that takes on numDiscrete or fewer values will be considered discrete and not retained in the cleaned data. Defaults to 55.

  • fracNA
    Only retain columns and rows with fewer NAs than this fraction. Defaults to 0.50.5.

  • precScale
    Only consider columns whose scale is larger than precScale. Here scale is measured by the median absolute deviation. Defaults to 1e121e-12.

  • silent
    Whether or not the function progress messages should be suppressed. Defaults to FALSE.

Details

The matrix raw.S in the output is the raw estimate of scatter produced by cellMCD. The final S is obtained from raw.S by rescaling such that its diagonal entries equal the squares of the univariate scales in locsca$scale. This reduces the bias at Gaussian data, which matters mainly for large sample sizes.

Value

A list with components:

Author(s)

J. Raymaekers and P.J. Rousseeuw

References

J. Raymaekers and P.J. Rousseeuw (2022). The cellwise MCD estimator, Journal of the American Statistical Association, to appear. doi:10.1080/01621459.2023.2267777(link to open access pdf)

See Also

plot_cellMCD

Examples

mu    <- rep(0, 3)
Sigma <- diag(3) * 0.5 + 0.5
set.seed(123)
X <- MASS::mvrnorm(1000, mu, Sigma)
X[1:5, 1]  <- X[1:5, 1] + 5
X[6:10, 2] <- X[6:10, 2] - 10
X[12, 1:2] <- c(-4,8)
colnames(X) <- c("X1","X2","X3")
cellMCD.out <- cellMCD(X)
cellMCD.out$mu
cov2cor(cellMCD.out$S)
cellMCD.out$W[1:15,]
cellMCD.out$Ximp[1:15,]
cellMap(cellMCD.out$Zres[1:15,])

# For more examples, we refer to the vignette:
## Not run: 
vignette("cellMCD_examples")

## End(Not run)

[Package cellWise version 2.5.3 Index]