R: Nearest Neighbour Imputation with Mahalanobis distance

POEM {modi}

R Documentation

Nearest Neighbour Imputation with Mahalanobis distance

Description

POEM takes into account missing values, outlier indicators, error indicators and sampling weights.

Usage

POEM(
  data,
  weights,
  outind,
  errors,
  missing.matrix,
  alpha = 0.5,
  beta = 0.5,
  reweight.out = FALSE,
  c = 5,
  preliminary.mean.imputation = FALSE,
  monitor = FALSE
)

Arguments

`data`	a data frame or matrix with the data.
`weights`	sampling weights.
`outind`	an indicator vector for the outliers with `1` indicating an outlier.
`errors`	matrix of indicators for items which failed edits.
`missing.matrix`	the missingness matrix can be given as input. Otherwise, it will be recalculated.
`alpha`	scalar giving the weight attributed to an item that is failing.
`beta`	minimal overlap to accept a donor.
`reweight.out`	if `TRUE`, the outliers are redefined.
`c`	tuning constant when redefining the outliers (cutoff for Mahalanobis distance).
`preliminary.mean.imputation`	assume the problematic observation is at the mean of good observations.
`monitor`	if `TRUE` verbose output.

Details

POEM assumes that an multivariate outlier detection has been carried out beforehand and assumes the result is summarized in the vector outind. In addition, further observations may have been flagged as failing edit-rules and this information is given in the vector errors. The mean and covariance estimate is calculated with the good observations (no outliers and downweighted errors). Preliminary mean imputation is sometimes needed to avoid a non-positive definite covariance estimate at this stage. Preliminary mean imputation assumes that the problematic values of an observation (with errors, outliers or missing) can be replaced by the mean of the rest of the non-problematic observations. Note that the algorithm imputes these problematic observations afterwards and therefore the final covariance matrix with imputed data is not the same as the working covariance matrix (which may be based on preliminary mean imputation).

Value

POEM returns a list whose first component output is a sub-list with the following components:

preliminary.mean.imputation: Logical. TRUE if preliminary mean imputation should be used
completely.missing: Number of observations with no observed values
good.values: Weighted number of of good values (not missing, not outlying, not erroneous)
nonoutliers.before: Number of nonoutliers before reweighting
weighted.nonoutliers.before: Weighted number of nonoutliers before reweighting
nonoutliers.after: Number of nonoutliers after reweighting
weighted.nonoutliers.after: Weighted number of nonoutliers after reweighting
old.center: Coordinate means after weighting, before imputation
old.variances: Coordinate variances after weighting, before imputation
new.center: Coordinate means after weighting, after imputation
new.variances: Coordinate variances after weighting, after imputation
covariance: Covariance (of standardised observations) before imputation
imputed.observations: Indices of observations with imputed values
donors: Indices of donors for imputed observations
new.outind: Indices of new outliers

The further component returned by POEM is:

imputed.data: Imputed data set

Author(s)

Beat Hulliger

References

Béguin, C. and Hulliger B., (2002), EUREDIT Workpackage x.2 D4-5.2.1-2.C Develop and evaluate new methods for statistical outlier detection and outlier robust multivariate imputation, Technical report, EUREDIT 2002.

Examples

data(bushfirem, bushfire.weights)
outliers <- rep(0, nrow(bushfirem))
outliers[31:38] <- 1
imp.res <- POEM(bushfirem, bushfire.weights, outliers,
preliminary.mean.imputation = TRUE)
print(imp.res$output)
var(imp.res$imputed.data)

[Package modi version 0.1.2 Index]