R: nearest neighbor NA imputation

nearestNeighborImpute {FRESA.CAD}

R Documentation

nearest neighbor NA imputation

Description

The function will replace any NA present in the data-frame with the median values of the nearest neighbours.

Usage

	nearestNeighborImpute(tobeimputed,
	                      referenceSet=NULL,
						  catgoricCol=NULL,
	                      distol=1.05,
						  useorder=TRUE
	                     )

Arguments

`tobeimputed`	a data frame with missing values (NA values)
`referenceSet`	An optional data frame with a set of complete observations. This data frame will be added to the search set
`catgoricCol`	An optional list of columns names that should be consider categorical
`distol`	The tolerance used to define if a particular set of row observations is similar to the minimum distance
`useorder`	Impute using the last observation on startified by categorical data

Details

This function will find any NA present in the data set and it will search for the row set of complete observations that have the closest IQR normalized Manhattan distance to the row with missing values. If a set of rows have similar minimum distances (toldis*(minimum distance) > row set distance) the median value will be used.

Value

A data frame, where each NA has been replaced with the value of the nearest neighbors

Author(s)

Jose G. Tamez-Pena

Examples

	## Not run: 
	# Get the stage C prostate cancer data from the rpart package
	library(rpart)
	data(stagec)
	# Set the options to keep the na
	options(na.action='na.pass')
	# create a model matrix with all the NA values imputed
	stagecImputed <- nearestNeighborImpute(model.matrix(~.,stagec)[,-1])
	
## End(Not run)

[Package FRESA.CAD version 3.4.8 Index]