hr05AdjustedDF {CerioliOutlierDetection} | R Documentation |
Adjusted Degrees of Freedom for Testing Robust Mahalanobis Distances for Outlyingness
Description
Computes the degrees of freedom for the adjusted F distribution for testing Mahalanobis distances calculated with the minimum covariance determinant (MCD) robust dispersion estimate (for data with a model normal distribution) as described in Hardin and Rocke (2005) or in Green and Martin (2017).
Usage
hr05AdjustedDF( n.obs, p.dim, mcd.alpha, m.asy, method = c("HR05", "GM14"))
Arguments
n.obs |
(Integer) Number of observations |
p.dim |
(Integer) Dimension of the data, i.e., number of variables. |
mcd.alpha |
(Numeric) Value that determines the fraction of the sample used to compute the MCD estimate. Default value corresponds to the maximum breakdown point case of the MCD. |
m.asy |
(Numeric) Asymptotic Wishart degrees of freedom.
The default value uses |
method |
Either "HR05" to use the method of Hardin and Rocke (2005), or "GM14" to use the method of Green and Martin (2017). |
Details
Hardin and Rocke (2005) derived an approximate F
distribution
for testing robust Mahalanobis distances, computed using the MCD
estimate of dispersion, for outlyingness. This distribution improves
upon the standard \chi^2
distribution for identifying outlying
points in data set. The method of Hardin and Rocke was designed to work
for the maximum breakdown point case of the MCD, where
\alpha = \lfloor (n.obs + p.dim + 1)/2 \rfloor/n.obs.
Green and Martin (2017) extended
this result to MCD(\alpha
), where \alpha
controls the
size of the sample used to compute the MCD estimate, as well as the
breakdown point of the estimator.
With argument method = "HR05"
the function returns
m_{pred}
as given in Equation 3.4 of Hardin and Rocke (2005).
The Hardin and Rocke method is only supported for the maximum breakdown
point case; an error will be generated for other values of mcd.alpha
.
The argument method = "GM14"
uses the extended methodology
described in Green and Martin (2017) and is available for all values
of mcd.alpha
.
Value
Returns the adjusted F degrees of freedom based on the asymptotic value, the dimension of the data, and the sample size.
Note
This function is typically not called directly by users; rather it is used in the construction of other functions.
Author(s)
Written and maintained by Christopher G. Green <christopher.g.green@gmail.com>
References
C. G. Green and R. Douglas Martin. An extension of a method of Hardin and Rocke, with an application to multivariate outlier detection via the IRMCD method of Cerioli. Working Paper, 2017. Available from https://christopherggreen.github.io/papers/hr05_extension.pdf
J. Hardin and D. M. Rocke. The distribution of robust distances. Journal of Computational and Graphical Statistics, 14:928-946, 2005. doi:10.1198/106186005X77685
See Also
Examples
hr05tester <- function(n,p) {
a <- floor( (n+p+1)/2 )/n
hr05AdjustedDF( n, p, a, ch99AsymptoticDF(n,p,a)$m.hat.asy, method="HR05" )
}
# compare to m_pred in table on page 941 of Hardin and Rocke (2005)
hr05tester( 50, 5)
hr05tester( 100,10)
hr05tester( 500,10)
hr05tester(1000,20)
# using default arguments
hr05tester <- function(n,p) {
hr05AdjustedDF( n, p, method="HR05" )
}
# compare to m_pred in table on page 941 of Hardin and Rocke (2005)
hr05tester( 50, 5)
hr05tester( 100,10)
hr05tester( 500,10)
hr05tester(1000,20)
# Green and Martin (2017) improved method
hr05tester <- function(n,p) {
hr05AdjustedDF( n, p, method="GM14" )
}
# compare to m_sim in table on page 941 of Hardin and Rocke (2005)
hr05tester( 50, 5)
hr05tester( 100,10)
hr05tester( 500,10)
hr05tester(1000,20)