R: High dimensional MCD based detection of outliers

High dimensional MCD based detection of outliers {Rfast}

R Documentation

High dimensional MCD based detection of outliers

Description

High dimensional MCD based detection of outliers.

Usage

rmdp(y, alpha = 0.05, itertime = 100, parallel = FALSE)

Arguments

`y`	A matrix with numerical data with more columns (p) than rows (n), i.e. n<p.
`alpha`	The significance level, i.e. used to decide whether an observation is said to be considered a possible outlier. The default value is 0.05.
`itertime`	The number of iterations the algorithm will be ran. The higher the sample size, the larger this number must be. With 50 observations in `R^1000` maybe this has to be 1000 in order to produce stable results.
`parallel`	A logical value for parallel version.

Details

High dimensional outliers (n<<p) are detected using a properly constructed MCD. The variances of the variables are used and the determinant is simply their product.

Value

A list including: runtime = runtime, dis = dis, wei = wei

`runtime`	The duration of the process.
`dis`	The final estimated Mahalanobis type normalised distances.
`wei`	A bollean variable vector specifying whether an observation is "clean" (TRUE) or a possible outlier (FALSE).
`cova`	The estimated covatriance matrix.

Author(s)

Initial R code: Changliang Zou <nk.chlzou@gmail.com> R code modifications: Michail Tsagris <mtsagris@uoc.gr> C++ implementation: Manos Papadakis <papadakm95@gmail.com> Documentation: Michail Tsagris <mtsagris@uoc.gr> and Changliang Zhou <nk.chlzou@gmail.com>

References

Ro K., Zou C., Wang Z. and Yin G. (2015). Outlier detection for high-dimensional data. Biometrika, 102(3):589-599.

Examples

x <- matrix(rnorm(50 * 400), ncol = 400)
a <- rmdp(x, itertime = 500)

x<-a<-NULL

[Package Rfast version 2.1.0 Index]