OutlierMahdist {rrcovHD} | R Documentation |
Outlier identification using robust (mahalanobis) distances based on robust multivariate location and covariance matrix
Description
This function uses the Mahalanobis distance as a basis for multivariate outlier detection. The standard method for multivariate outlier detection is robust estimation of the parameters in the Mahalanobis distance and the comparison with a critical value of the Chi2 distribution (Rousseeuw and Van Zomeren, 1990).
Usage
OutlierMahdist(x, ...)
## Default S3 method:
OutlierMahdist(x, grouping, control, trace=FALSE, ...)
## S3 method for class 'formula'
OutlierMahdist(formula, data, ..., subset, na.action)
Arguments
formula |
a formula with no response variable, referring only to numeric variables. |
data |
an optional data frame (or similar: see
|
subset |
an optional vector used to select rows (observations) of the
data matrix |
na.action |
a function which indicates what should happen
when the data contain |
... |
arguments passed to or from other methods. |
x |
a matrix or data frame. |
grouping |
grouping variable: a factor specifying the class for each observation. |
control |
a control object (S4) for one of the available control classes,
e.g. |
trace |
whether to print intermediate results. Default is |
Details
If the data set consists of two or more classes
(specified by the grouping variable grouping
) the proposed method iterates
through the classes present in the data, separates each class from the rest and
identifies the outliers relative to this class, thus treating both types of outliers,
the mislabeled and the abnormal samples in a homogenous way.
The estimation method is selected by the control object control
.
If a character string naming an estimator is specified, a
new control object will be created and used (with default estimation options).
If this argument is missing or a character string
'auto' is specified, the function will select the robust estimator
according to the size of the dataset - for details see CovRobust
.
Value
An S4 object of class OutlierMahdist
which
is a subclass of the virtual class Outlier
.
Author(s)
Valentin Todorov valentin.todorov@chello.at
References
P. J. Rousseeuw and B. C. Van Zomeren (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association. Vol. 85(411), pp. 633-651.
P. J. Rousseeuw and A. M. Leroy (1987). Robust Regression and Outlier Detection. Wiley.
P. J. Rousseeuw and K. van Driessen (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223.
Todorov V & Filzmoser P (2009). An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software, 32(3), 1–47, doi:10.18637/jss.v032.i03.
Filzmoser P & Todorov V (2013). Robust tools for the imperfect world, Information Sciences 245, 4–20. doi:10.1016/j.ins.2012.10.017.
Examples
data(hemophilia)
obj <- OutlierMahdist(gr~.,data=hemophilia)
obj
getDistance(obj) # returns an array of distances
getClassLabels(obj, 1) # returns an array of indices for a given class
getCutoff(obj) # returns an array of cutoff values (for each class, usually equal)
getFlag(obj) # returns an 0/1 array of flags
plot(obj, class=2) # standard plot function