DDC {cellWise} | R Documentation |
Detect Deviating Cells
Description
This function aims to detect cellwise outliers in the data. These are entries in the data matrix which are substantially higher or lower than what could be expected based on the other cells in its column as well as the other cells in its row, taking the relations between the columns into account. Note that this function first calls checkDataSet
and analyzes the remaining cleaned data.
Usage
DDC(X, DDCpars = list())
Arguments
X |
|
DDCpars |
A list of available options:
|
Value
A list with components:
DDCpars
The list of options used.colInAnalysis
The column indices of the columns used in the analysis.rowInAnalysis
The row indices of the rows used in the analysis.namesNotNumeric
The names of the variables which are not numeric.namesCaseNumber
The name of the variable(s) which contained the case numbers and was therefore removed.namesNAcol
Names of the columns left out due to too manyNA
's.namesNArow
Names of the rows left out due to too manyNA
's.namesDiscrete
Names of the discrete variables.namesZeroScale
Names of the variables with zero scale.remX
Cleaned data aftercheckDataSet
.locX
Estimated location ofX
.scaleX
Estimated scales ofX
.Z
StandardizedremX
.nbngbrs
Number of neighbors used in estimation.ngbrs
Indicates neighbors of each column, i.e. the columns most correlated with it.robcors
Robust correlations.robslopes
Robust slopes.deshrinkage
The deshrinkage factor used for every connected (i.e. non-standalone) column ofX
.Xest
PredictedX
.scalestres
Scale estimate of the residualsX - Xest
.stdResid
Residuals of orginalX
minus the estimatedXest
, standardized by column.indcells
Indices of the cells which were flagged in the analysis.Ti
Outlyingness value of each row.medTi
Median of the Ti values.madTi
Mad of the Ti values.indrows
Indices of the rows which were flagged in the analysis.indNAs
Indices of all NA cells.indall
Indices of all cells which were flagged in the analysis plus all cells in flagged rows plus the indices of the NA cells.Ximp
ImputedX
.
Author(s)
Raymaekers J., Rousseeuw P.J., Van den Bossche W.
References
Rousseeuw, P.J., Van den Bossche W. (2018). Detecting Deviating Data Cells. Technometrics, 60(2), 135-145. (link to open access pdf)
Raymaekers, J., Rousseeuw P.J. (2019). Fast robust correlation for high dimensional data. Technometrics, 63(2), 184-198. (link to open access pdf)
See Also
Examples
library(MASS); set.seed(12345)
n <- 50; d <- 20
A <- matrix(0.9, d, d); diag(A) = 1
x <- mvrnorm(n, rep(0,d), A)
x[sample(1:(n * d), 50, FALSE)] <- NA
x[sample(1:(n * d), 50, FALSE)] <- 10
x[sample(1:(n * d), 50, FALSE)] <- -10
x <- cbind(1:n, x)
DDCx <- DDC(x)
cellMap(DDCx$stdResid)
# For more examples, we refer to the vignette:
## Not run:
vignette("DDC_examples")
## End(Not run)