R: Sequential KNN imputation method

seqKNNimp {multiUS}

R Documentation

Sequential KNN imputation method

Description

This function estimates missing values sequentially from the units that has least missing rate, using weighted mean of k nearest neighbors.

Usage

seqKNNimp(data, k = 10)

Arguments

`data`	A data frame with the data set.
`k`	The number of nearest neighbours to use (defaults to 10).

Details

The function separates the dataset into an incomplete set with missing values and into a complete set without missing values. The values in an incomplete set are imputed in the order of the number of missing values. A missing value is filled by the weighted mean value of a corresponding column of the nearest neighbour units in the complete set. Once all missing values for a given unit are imputed, the unit is moved into the complete set and used for the imputation of the rest of units in the incomplete set. In this process, all missing values for one unit can be imputed simultaneously from the selected neighbour units in the complete set. This reduces execution time from previously developed KNN method that selects nearest neighbours for each imputation.

Value

A dataframe with imputed values.

Note

This is the function from package SeqKNN by Ki-Yeol Kim and Gwan-Su Yi.

Author(s)

Ki-Yeol Kim and Gwan-Su Yi

References

Ki-Yeol Kim, Byoung-Jin Kim, Gwan-Su Yi (2004.Oct.26) "Reuse of imputed data in microarray analysis increases imputation efficiency", BMC Bioinformatics 5:160.

Examples

mtcars$mpg[sample(1:nrow(mtcars), size = 5, replace = FALSE)] <- NA
seqKNNimp(data = mtcars)

[Package multiUS version 1.2.3 Index]