R: Imputation using Weighted K-nearest Neighbors

fill.KNNimpute {filling}

R Documentation

Imputation using Weighted K-nearest Neighbors

Description

One of the simplest idea to guess missing entry is to use portion of the data that has most similar characteristics across all covariates. fill.KNNimpute follows such reasoning in that it finds K-nearest neighbors based on observed variables and uses weighted average of nearest elements to fill in the missing entry. Note that when there are many missing entries, it's possible that there are no surrogates to be computed upon. Therefore, if there exists an entire row or column full of missing entries, the algorithm stops.

Usage

fill.KNNimpute(A, k = ceiling(nrow(A)/2))

Arguments

`A`	an `(n\times p)` partially observed matrix.
`k`	the number of neighbors to use.

Value

a named list containing

X: an (n\times p) matrix after completion.

References

Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001). “Missing value estimation methods for DNA microarrays.” Bioinformatics, 17(6), 520–525. ISSN 1367-4803.

Examples

## load image data of 'lena128'
data(lena128)

## transform 5% of entries into missing
set.seed(5)
A <- aux.rndmissing(lena128, x=0.05)

## apply the method with 3 different neighborhood size
fill1 <- fill.KNNimpute(A, k=5)
fill2 <- fill.KNNimpute(A, k=25)
fill3 <- fill.KNNimpute(A, k=50)

## visualize only the last ones from each run
opar <- par(no.readonly=TRUE)
par(mfrow=c(2,2), pty="s")
image(A, col=gray((0:100)/100), axes=FALSE, main="5% missing")
image(fill1$X, col=gray((0:100)/100), axes=FALSE, main="5-neighbor")
image(fill2$X, col=gray((0:100)/100), axes=FALSE, main="25-neighbor")
image(fill3$X, col=gray((0:100)/100), axes=FALSE, main="50-neighbor")
par(opar)

[Package filling version 0.2.3 Index]