nearestNeighborImpute {FRESA.CAD} | R Documentation |
nearest neighbor NA imputation
Description
The function will replace any NA present in the data-frame with the median values of the nearest neighbours.
Usage
nearestNeighborImpute(tobeimputed,
referenceSet=NULL,
catgoricCol=NULL,
distol=1.05,
useorder=TRUE
)
Arguments
tobeimputed |
a data frame with missing values (NA values) |
referenceSet |
An optional data frame with a set of complete observations. This data frame will be added to the search set |
catgoricCol |
An optional list of columns names that should be consider categorical |
distol |
The tolerance used to define if a particular set of row observations is similar to the minimum distance |
useorder |
Impute using the last observation on startified by categorical data |
Details
This function will find any NA present in the data set and it will search for the row set of complete observations that have the closest IQR normalized Manhattan distance to the row with missing values. If a set of rows have similar minimum distances (toldis*(minimum distance) > row set distance) the median value will be used.
Value
A data frame, where each NA has been replaced with the value of the nearest neighbors
Author(s)
Jose G. Tamez-Pena
Examples
## Not run:
# Get the stage C prostate cancer data from the rpart package
library(rpart)
data(stagec)
# Set the options to keep the na
options(na.action='na.pass')
# create a model matrix with all the NA values imputed
stagecImputed <- nearestNeighborImpute(model.matrix(~.,stagec)[,-1])
## End(Not run)