cv.wNNSel {wNNSel}R Documentation

Cross Validation for wNNSel Imputation

Description

This function aims to search for optimal values of the tuning parameters for the wNNSel imputation.

Usage

cv.wNNSel(x, kernel = "gaussian", x.dist = "euclidean", method = "2",
  m.values = seq(2, 8, by = 2), c.values = seq(0.1, 0.5, by = 0.1),
  lambda.values = seq(0, 0.6, by = 0.01)[-1], times.max = 5,
  testNA.prop = 0.05)

Arguments

x

a matrix containing missing values

kernel

kernel function to be used in nearest neighbors imputation. Default kernel function is "gaussian".

x.dist

distance to compute, The default is x.dist="euclidean" to compute Euclidean distance. Set x.dist to NULL to use Manhattan distance.

method

convex function, performs selection of variables. If method="1", linear function is used and when if method="c", power function is used.

m.values

a vector of integer values, required when mehtod="2".

c.values

a vector between 0 and less than 1. It is required when mehtod="1".

lambda.values

a vector, for the tuning parameter \lambda

times.max

maximum number of repititions for the cross validation procedure.

testNA.prop

proportion of values to be deleted artificially for cross validation in the missing matrix x. Default method uses 5 percent.

Details

Some values are artificially deleted and wNNSel is run multiple times, varying \lambda and m. For each pair of \lambda and m, compute MSIE on the subset of the data matrix x for which the the values were deleted artificially. (See References for more detail).

Value

a list containing

lambda.opt

optimal parameter selected by cross validation

m.opt

optimal parameter selected by cross validation

MSIE.cv

cross validation error

Author(s)

Shahla Faisal <shahla_ramzan@yahoo.com>

References

Tutz, G. and Ramzan,S. (2015). Improved methods for the imputation of missing data by nearest neighbor methods. Computational Statistics and Data Analysis, Vol. 90, pp. 84-99.

Faisal, S. and Tutz, G. (2017). Missing value imputation for gene expression data by tailored nearest neighbors. Statistical Application in Genetics and Molecular Biology. Vol. 16(2), pp. 95-106.

See Also

artifNA.cv, wNNSel

Examples

 set.seed(3)
 x.true = matrix(rnorm(100),10,10)
 ## create 10% missing values in x
 x.miss = artifNA(x.true, 0.10)
 ## use cross validation to find optimal values
 result = cv.wNNSel(x.miss)
 ## optimal values are
 result$lambda.opt
 result$m.opt
 ## Now use these values to get final imputation
 x.impute = wNNSel.impute(x.miss, lambda=result$lambda.opt, m=result$m.opt)
 ## and final MSIE
 computeMSIE(x.miss, x.impute, x.true)

[Package wNNSel version 0.1 Index]