corHuber {robustHD} | R Documentation |
Robust correlation based on winsorization
Description
Compute a robust correlation estimate based on winsorization, i.e., by shrinking outlying observations to the border of the main part of the data.
Usage
corHuber(
x,
y,
type = c("bivariate", "adjusted", "univariate"),
standardized = FALSE,
centerFun = median,
scaleFun = mad,
const = 2,
prob = 0.95,
tol = .Machine$double.eps^0.5,
...
)
Arguments
x |
a numeric vector. |
y |
a numeric vector. |
type |
a character string specifying the type of winsorization to be
used. Possible values are |
standardized |
a logical indicating whether the data are already robustly standardized. |
centerFun |
a function to compute a robust estimate for the center to
be used for robust standardization (defaults to
|
scaleFun |
a function to compute a robust estimate for the scale to
be used for robust standardization (defaults to |
const |
numeric; tuning constant to be used in univariate or adjusted univariate winsorization (defaults to 2). |
prob |
numeric; probability for the quantile of the
|
tol |
a small positive numeric value. This is used in bivariate winsorization to determine whether the initial estimate from adjusted univariate winsorization is close to 1 in absolute value. In this case, bivariate winsorization would fail since the points form almost a straight line, and the initial estimate is returned. |
... |
additional arguments to be passed to
|
Details
The borders of the main part of the data are defined on the scale of the
robustly standardized data. In univariate winsorization, the borders for
each variable are given by +/-
const
, thus a symmetric
distribution is assumed. In adjusted univariate winsorization, the borders
for the two diagonally opposing quadrants containing the minority of the
data are shrunken by a factor that depends on the ratio between the number of
observations in the major and minor quadrants. It is thus possible to
better account for the bivariate structure of the data while maintaining
fast computation. In bivariate winsorization, a bivariate normal
distribution is assumed and the data are shrunken towards the boundary of a
tolerance ellipse with coverage probability prob
. The boundary of
this ellipse is thereby given by all points that have a squared Mahalanobis
distance equal to the quantile of the \chi^{2}
distribution given by prob
. Furthermore, the initial correlation
matrix required for the Mahalanobis distances is computed based on adjusted
univariate winsorization.
Value
The robust correlation estimate.
Author(s)
Andreas Alfons, based on code by Jafar A. Khan, Stefan Van Aelst and Ruben H. Zamar
References
Khan, J.A., Van Aelst, S. and Zamar, R.H. (2007) Robust linear model selection based on least angle regression. Journal of the American Statistical Association, 102(480), 1289–1299. doi:10.1198/016214507000000950
See Also
Examples
## generate data
library("mvtnorm")
set.seed(1234) # for reproducibility
Sigma <- matrix(c(1, 0.6, 0.6, 1), 2, 2)
xy <- rmvnorm(100, sigma=Sigma)
x <- xy[, 1]
y <- xy[, 2]
## introduce outlier
x[1] <- x[1] * 10
y[1] <- y[1] * (-5)
## compute correlation
cor(x, y)
corHuber(x, y)