hr05AdjustedDF {CerioliOutlierDetection}R Documentation

Adjusted Degrees of Freedom for Testing Robust Mahalanobis Distances for Outlyingness

Description

Computes the degrees of freedom for the adjusted F distribution for testing Mahalanobis distances calculated with the minimum covariance determinant (MCD) robust dispersion estimate (for data with a model normal distribution) as described in Hardin and Rocke (2005) or in Green and Martin (2014).

Usage

hr05AdjustedDF( n.obs, p.dim, mcd.alpha, m.asy, method = c("HR05", "GM14"))

Arguments

n.obs

(Integer) Number of observations

p.dim

(Integer) Dimension of the data, i.e., number of variables.

mcd.alpha

(Numeric) Value that determines the fraction of the sample used to compute the MCD estimate. Default value corresponds to the maximum breakdown point case of the MCD.

m.asy

(Numeric) Asymptotic Wishart degrees of freedom. The default value uses ch99AsymptoticDF to obtain the the finite-sample asymptotic value, but the user can also provide a pre-computed value.

method

Either "HR05" to use the method of Hardin and Rocke (2005), or "GM14" to use the method of Green and Martin (2014).

Details

Hardin and Rocke (2005) derived an approximate F distribution for testing robust Mahalanobis distances, computed using the MCD estimate of dispersion, for outlyingness. This distribution improves upon the standard \chi^2 distribution for identifying outlying points in data set. The method of Hardin and Rocke was designed to work for the maximum breakdown point case of the MCD, where

\alpha = \lfloor (n.obs + p.dim + 1)/2 \rfloor/n.obs.

Green and Martin (2014) extended this result to MCD(\alpha), where \alpha controls the size of the sample used to compute the MCD estimate, as well as the breakdown point of the estimator.

With argument method = "HR05" the function returns m_{pred} as given in Equation 3.4 of Hardin and Rocke (2005). The Hardin and Rocke method is only supported for the maximum breakdown point case; an error will be generated for other values of mcd.alpha.

The argument method = "GM14" uses the extended methodology described in Green and Martin (2014) and is available for all values of mcd.alpha.

Value

Returns the adjusted F degrees of freedom based on the asymptotic value, the dimension of the data, and the sample size.

Note

This function is typically not called directly by users; rather it is used in the construction of other functions.

Author(s)

Written and maintained by Christopher G. Green <christopher.g.green@gmail.com>

References

C. G. Green and R. Douglas Martin. An extension of a method of Hardin and Rocke, with an application to multivariate outlier detection via the IRMCD method of Cerioli. Working Paper, 2017. Available from https://christopherggreen.github.io/papers/hr05_extension.pdf

J. Hardin and D. M. Rocke. The distribution of robust distances. Journal of Computational and Graphical Statistics, 14:928-946, 2005. doi:10.1198/106186005X77685

See Also

ch99AsymptoticDF

Examples

hr05tester <- function(n,p) {
	a <- floor( (n+p+1)/2 )/n
	hr05AdjustedDF( n, p, a, ch99AsymptoticDF(n,p,a)$m.hat.asy, method="HR05" )
}
# compare to m_pred in table on page 941 of Hardin and Rocke (2005)
hr05tester(  50, 5)
hr05tester( 100,10)
hr05tester( 500,10)
hr05tester(1000,20)

# using default arguments
hr05tester <- function(n,p) {
	hr05AdjustedDF( n, p, method="HR05" )
}
# compare to m_pred in table on page 941 of Hardin and Rocke (2005)
hr05tester(  50, 5)
hr05tester( 100,10)
hr05tester( 500,10)
hr05tester(1000,20)

# Green and Martin (2014) improved method
hr05tester <- function(n,p) {
	hr05AdjustedDF( n, p, method="GM14" )
}
# compare to m_sim in table on page 941 of Hardin and Rocke (2005)
hr05tester(  50, 5)
hr05tester( 100,10)
hr05tester( 500,10)
hr05tester(1000,20)

[Package CerioliOutlierDetection version 1.1.13 Index]