High dimensional MCD based detection of outliers {Rfast}R Documentation

High dimensional MCD based detection of outliers

Description

High dimensional MCD based detection of outliers.

Usage

rmdp(y, alpha = 0.05, itertime = 100, parallel = FALSE)

Arguments

y

A matrix with numerical data with more columns (p) than rows (n), i.e. n<p.

alpha

The significance level, i.e. used to decide whether an observation is said to be considered a possible outlier. The default value is 0.05.

itertime

The number of iterations the algorithm will be ran. The higher the sample size, the larger this number must be. With 50 observations in R^1000 maybe this has to be 1000 in order to produce stable results.

parallel

A logical value for parallel version.

Details

High dimensional outliers (n<<p) are detected using a properly constructed MCD. The variances of the variables are used and the determinant is simply their product.

Value

A list including: runtime = runtime, dis = dis, wei = wei

runtime

The duration of the process.

dis

The final estimated Mahalanobis type normalised distances.

wei

A bollean variable vector specifying whether an observation is "clean" (TRUE) or a possible outlier (FALSE).

cova

The estimated covatriance matrix.

Author(s)

Initial R code: Changliang Zou <nk.chlzou@gmail.com> R code modifications: Michail Tsagris <mtsagris@uoc.gr> C++ implementation: Manos Papadakis <papadakm95@gmail.com> Documentation: Michail Tsagris <mtsagris@uoc.gr> and Changliang Zhou <nk.chlzou@gmail.com>

References

Ro K., Zou C., Wang Z. and Yin G. (2015). Outlier detection for high-dimensional data. Biometrika, 102(3):589-599.

See Also

colmeans, colVars, colMedians

Examples

x <- matrix(rnorm(50 * 400), ncol = 400)
a <- rmdp(x, itertime = 500)

x<-a<-NULL

[Package Rfast version 2.1.0 Index]