fill.KNNimpute {filling} | R Documentation |
Imputation using Weighted K-nearest Neighbors
Description
One of the simplest idea to guess missing entry is to use
portion of the data that has most similar characteristics across
all covariates. fill.KNNimpute
follows such reasoning in that
it finds K
-nearest neighbors based on observed variables and
uses weighted average of nearest elements to fill in the missing entry.
Note that when there are many missing entries, it's possible that there are
no surrogates to be computed upon. Therefore, if there exists an entire
row or column full of missing entries, the algorithm stops.
Usage
fill.KNNimpute(A, k = ceiling(nrow(A)/2))
Arguments
A |
an |
k |
the number of neighbors to use. |
Value
a named list containing
- X
an
(n\times p)
matrix after completion.
References
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001). “Missing value estimation methods for DNA microarrays.” Bioinformatics, 17(6), 520–525. ISSN 1367-4803.
See Also
Examples
## load image data of 'lena128'
data(lena128)
## transform 5% of entries into missing
set.seed(5)
A <- aux.rndmissing(lena128, x=0.05)
## apply the method with 3 different neighborhood size
fill1 <- fill.KNNimpute(A, k=5)
fill2 <- fill.KNNimpute(A, k=25)
fill3 <- fill.KNNimpute(A, k=50)
## visualize only the last ones from each run
opar <- par(no.readonly=TRUE)
par(mfrow=c(2,2), pty="s")
image(A, col=gray((0:100)/100), axes=FALSE, main="5% missing")
image(fill1$X, col=gray((0:100)/100), axes=FALSE, main="5-neighbor")
image(fill2$X, col=gray((0:100)/100), axes=FALSE, main="25-neighbor")
image(fill3$X, col=gray((0:100)/100), axes=FALSE, main="50-neighbor")
par(opar)