OutlierPCOut {rrcovHD} | R Documentation |
Outlier identification in high dimensions using the PCOUT algorithm
Description
The function implements a computationally fast procedure for identifying outliers that is particularly effective in high dimensions. This algorithm utilizes simple properties of principal components to identify outliers in the transformed space, leading to significant computational advantages for high-dimensional data. This approach requires considerably less computational time than existing methods for outlier detection, and is suitable for use on very large data sets. It is also capable of analyzing the data situation commonly found in certain biological applications in which the number of dimensions is several orders of magnitude larger than the number of observations.
Usage
OutlierPCOut(x, ...)
## Default S3 method:
OutlierPCOut(x, grouping, explvar=0.99, trace=FALSE, ...)
## S3 method for class 'formula'
OutlierPCOut(formula, data, ..., subset, na.action)
Arguments
formula |
a formula with no response variable, referring only to numeric variables. |
data |
an optional data frame (or similar: see
|
subset |
an optional vector used to select rows (observations) of the
data matrix |
na.action |
a function which indicates what should happen
when the data contain |
... |
arguments passed to or from other methods. |
x |
a matrix or data frame. |
grouping |
grouping variable: a factor specifying the class for each observation. |
explvar |
a numeric value between 0 and 1 indicating how much variance should be covered by the robust PCs (default to 0.99) |
trace |
whether to print intermediate results. Default is |
Details
If the data set consists of two or more classes
(specified by the grouping variable grouping
) the proposed method iterates
through the classes present in the data, separates each class from the rest and
identifies the outliers relative to this class, thus treating both types of outliers,
the mislabeled and the abnormal samples in a homogenous way.
Value
An S4 object of class OutlierPCOut
which
is a subclass of the virtual class Outlier
.
Author(s)
Valentin Todorov valentin.todorov@chello.at
References
P. Filzmoser, R. Maronna and M. Werner (2008). Outlier identification in high dimensions, Computational Statistics & Data Analysis, Vol. 52 1694–1711.
Filzmoser P & Todorov V (2013). Robust tools for the imperfect world, Information Sciences 245, 4–20. doi:10.1016/j.ins.2012.10.017.
See Also
Examples
data(hemophilia)
obj <- OutlierPCOut(gr~.,data=hemophilia)
obj
getDistance(obj) # returns an array of distances
getClassLabels(obj, 1) # returns an array of indices for a given class
getCutoff(obj) # returns an array of cutoff values (for each class, usually equal)
getFlag(obj) # returns an 0/1 array of flags
plot(obj, class=2) # standard plot function