| POEM {modi} | R Documentation |
Nearest Neighbour Imputation with Mahalanobis distance
Description
POEM takes into account missing values, outlier indicators, error indicators and sampling weights.
Usage
POEM(
data,
weights,
outind,
errors,
missing.matrix,
alpha = 0.5,
beta = 0.5,
reweight.out = FALSE,
c = 5,
preliminary.mean.imputation = FALSE,
monitor = FALSE
)
Arguments
data |
a data frame or matrix with the data. |
weights |
sampling weights. |
outind |
an indicator vector for the outliers with |
errors |
matrix of indicators for items which failed edits. |
missing.matrix |
the missingness matrix can be given as input. Otherwise, it will be recalculated. |
alpha |
scalar giving the weight attributed to an item that is failing. |
beta |
minimal overlap to accept a donor. |
reweight.out |
if |
c |
tuning constant when redefining the outliers (cutoff for Mahalanobis distance). |
preliminary.mean.imputation |
assume the problematic observation is at the mean of good observations. |
monitor |
if |
Details
POEM assumes that an multivariate outlier detection has been carried out
beforehand and assumes the result is summarized in the vector outind.
In addition, further observations may have been flagged as failing edit-rules
and this information is given in the vector errors. The mean and
covariance estimate is calculated with the good observations (no outliers and
downweighted errors). Preliminary mean imputation is sometimes needed to avoid
a non-positive definite covariance estimate at this stage. Preliminary mean
imputation assumes that the problematic values of an observation (with errors,
outliers or missing) can be replaced by the mean of the rest of the non-problematic
observations. Note that the algorithm imputes these problematic observations
afterwards and therefore the final covariance matrix with imputed data is not
the same as the working covariance matrix (which may be based on preliminary mean
imputation).
Value
POEM returns a list whose first component output is a
sub-list with the following components:
preliminary.mean.imputationLogical.
TRUEif preliminary mean imputation should be usedcompletely.missingNumber of observations with no observed values
good.valuesWeighted number of of good values (not missing, not outlying, not erroneous)
nonoutliers.beforeNumber of nonoutliers before reweighting
weighted.nonoutliers.beforeWeighted number of nonoutliers before reweighting
nonoutliers.afterNumber of nonoutliers after reweighting
weighted.nonoutliers.afterWeighted number of nonoutliers after reweighting
old.centerCoordinate means after weighting, before imputation
old.variancesCoordinate variances after weighting, before imputation
new.centerCoordinate means after weighting, after imputation
new.variancesCoordinate variances after weighting, after imputation
covarianceCovariance (of standardised observations) before imputation
imputed.observationsIndices of observations with imputed values
donorsIndices of donors for imputed observations
new.outindIndices of new outliers
The further component returned by POEM is:
imputed.dataImputed data set
Author(s)
Beat Hulliger
References
Béguin, C. and Hulliger B., (2002), EUREDIT Workpackage x.2 D4-5.2.1-2.C Develop and evaluate new methods for statistical outlier detection and outlier robust multivariate imputation, Technical report, EUREDIT 2002.
Examples
data(bushfirem, bushfire.weights)
outliers <- rep(0, nrow(bushfirem))
outliers[31:38] <- 1
imp.res <- POEM(bushfirem, bushfire.weights, outliers,
preliminary.mean.imputation = TRUE)
print(imp.res$output)
var(imp.res$imputed.data)