Winsimp {modi} | R Documentation |
Winsorization followed by imputation
Description
Winsorization of outliers according to the Mahalanobis distance followed by an imputation under the multivariate normal model. Only the outliers are winsorized. The Mahalanobis distance MDmiss allows for missing values.
Usage
Winsimp(data, center, scatter, outind, seed = 1000003)
Arguments
data |
a data frame with the data. |
center |
(robust) estimate of the center (location) of the observations. |
scatter |
(robust) estimate of the scatter (covariance-matrix) of the observations. |
outind |
logical vector indicating outliers with 1 or TRUE for outliers. |
seed |
seed for random number generator. |
Details
It is assumed that center
, scatter
and outind
stem from a multivariate outlier detection algorithm which produces
robust estimates and which declares outliers observations with a large
Mahalanobis distance. The cutpoint is calculated as the least (unsquared)
Mahalanobis distance among the outliers. The winsorization reduces the
weight of the outliers:
\hat{y}_i = \mu_R + (y_i - \mu_R) \cdot c/d_i
where \mu_R
is the robust center and d_i
is the (unsquared) Mahalanobis
distance of observation i.
Value
Winsimp
returns a list whose first component output
is a
sublist with the following components:
cutpoint
Cutpoint for outliers
proc.time
Processing time
n.missing.before
Number of missing values before imputation
n.missing.after
Number of missing values after imputation
The further component returned by winsimp
is:
imputed.data
Imputed data set
Author(s)
Beat Hulliger
References
Hulliger, B. (2007), Multivariate Outlier Detection and Treatment in Business Surveys, Proceedings of the III International Conference on Establishment Surveys, Montréal.
See Also
Examples
data(bushfirem, bushfire.weights)
det.res <- TRC(bushfirem, weight = bushfire.weights)
imp.res <- Winsimp(bushfirem, det.res$center, det.res$scatter, det.res$outind)
print(imp.res$n.missing.after)