DI {cellWise}R Documentation

Detection-Imputation algorithm

Description

The Detection-Imputation algorithm computes cellwise robust estimates of the center and covariance matrix of a data set X. The algorithm alternates between the detection of cellwise outliers and their imputation combined with re-estimation of the center and covariance matrix. By default, it starts by calling checkDataSet to clean the data.

Usage

DI(X, initEst = "DDCWcov", crit = 0.01, maxits = 10, quant = 0.99,
maxCol = 0.25, checkPars = list())

Arguments

X

X is the input data, and must be an n by d matrix or a data frame.

initEst

An initial estimator for the center and covariance matrix. Should be one of "DDCWcov" or "TSGS", where the latter refers to the function GSE::TSGS. The default option "DDCWcov" uses the proposal of Raymaekers and Rousseeuw (2020) which is much faster for increasing dimension.

crit

The algorithm converges when the subsequent estimates of the center and covariance matrix do not differ more than crit in squared Euclidean norm.

maxits

Maximum number of DI-iterations.

quant

The cutoff used to detect cellwise outliers.

maxCol

The maximum number of cellwise outliers allowed in a column.

checkPars

Optional list of parameters used in the call to checkDataSet. The options are:

  • coreOnly
    If TRUE, skip the execution of checkDataset. Defaults to FALSE

  • numDiscrete
    A column that takes on numDiscrete or fewer values will be considered discrete and not retained in the cleaned data. Defaults to 5.

  • fracNA Only retain columns and rows with fewer NAs than this fraction. Defaults to 0.15.

  • precScale
    Only consider columns whose scale is larger than precScale. Here scale is measured by the median absolute deviation. Defaults to 1e-12.

  • silent
    Whether or not the function progress messages should be suppressed. Defaults to FALSE.

Value

A list with components:

Author(s)

J. Raymaekers and P.J. Rousseeuw

References

J. Raymaekers and P.J. Rousseeuw (2020). Handling cellwise outliers by sparse regression and robust covariance. Journal of Data Science, Statistics, and Visualisation. doi:10.52933/jdssv.v1i3.18(link to open access pdf)

See Also

cellHandler

Examples

mu <- rep(0, 3)
Sigma <- diag(3) * 0.1 + 0.9
X <- MASS::mvrnorm(100, mu, Sigma)
DI.out <- DI(X)
DI.out$cov
# For more examples, we refer to the vignette:
## Not run: 
vignette("DI_examples")

## End(Not run)

[Package cellWise version 2.5.3 Index]