wNNSel {wNNSel} | R Documentation |
Imputatin using wNNSel method.
Description
'wNNSel'
is used to impute the missing values particularly in high dimensional data.
It uses a cross validation procedure for selecting the best values of the tuning parameters.
It also works when the samples are smaller than the covariates.
Usage
wNNSel(x, x.initial = NULL, x.true = NULL, k, useAll = TRUE,
x.dist = "euclidean", kernel = "gaussian", method = "2", impute.fn,
convex = TRUE, m.values = seq(2, 8, by = 2), c.values = seq(0.1, 0.5, by
= 0.1), lambda.values = seq(0, 0.6, by = 0.01)[-1], times.max = 5,
testNA.prop = 0.05, withinFolds = FALSE, folds, verbose = TRUE)
Arguments
x |
a numeric data |
x.initial |
an optional. A complete data matrix e.g. using mean imputation of |
x.true |
a matrix of true or complete data. If provided, |
k |
an optional, the number of nearest neighbors to use for imputation. |
useAll |
|
x.dist |
distance to compute. The default is |
kernel |
kernel function to be used in nearest neighbors imputation. Default kernel function is "gaussian". |
method |
convex function, performs selection of variables. If |
impute.fn |
the imputation function to run on the length k vector of values for a missing feature. Defaults to a weighted mean of the neighboring values, weighted by the specified |
convex |
logical. If |
m.values |
a |
c.values |
a |
lambda.values |
a |
times.max |
maximum number of repititions for the cross validation procedure. |
testNA.prop |
proportion of values to be deleted artificially for
cross validation in the missing matrix |
withinFolds |
|
folds |
a |
verbose |
logical. If |
Details
For each sample, identify missinng features. For each missing feature
find the nearest neighbors which have that feature. Impute the missing
value using the imputation function on the selected vector of values
found from the neighbors.
By default the wNNSel
method automatically searches for optimal values for a given data matrix.
The default method uses x.dist="euclidean"
including selected covariates.
The specific distancs are computed using important covariates only.
If mehtod="1"
, the linear function in absolute value of r
is used, defined by
\frac{|r|}{1-c} - \frac{c}{1-c},
for |r|>c
, and, 0 , otherwise.
By default, the power function |r|^m
is used when mehtod="2"
. For more detailed discussion, see references.
Value
a list containing imputed data matrix, and cross validation results
x.impute |
imputed data matrix |
MSIE |
True error. Note it is only available when x.true is provided. |
lambda.opt |
optimal parameter selected by cross validation |
m.opt |
optimal parameter selected by cross validation |
MSIE.cv |
cross validation error |
References
Tutz, G. and Ramzan,S. (2015). Improved methods for the imputation of missing data by nearest neighbor methods. Computational Statistics and Data Analysis, Vol. 90, pp. 84-99.
Faisal, S. and Tutz, G. (2017). Missing value imputation for gene expression data by tailored nearest neighbors. Statistical Application in Genetics and Molecular Biology. Vol. 16(2), pp. 95-106.
See Also
Examples
set.seed(3)
x.true = matrix(rnorm(100),10,10)
## create 10% missing values in x
x.miss = artifNA(x.true, 0.10)
## imputed matrix
result <- wNNSel(x.miss)
result$x.impute
## cross validation result can be accessed using
result$cross.val