varSelection {yaImpute} | R Documentation |
Select variables for imputation models
Description
Computes grmsd
(generalized root mean square distance)
as variables are added to (method="addVars"
) or removed from
(method="delVars"
) an k-NN imputation model. When adding variables
the function keeps variables that strengthen imputation and
deletes that weaken the imputation the least.
The measure of model strength is grmsd between
imputed and observed Y-variables among the reference observations.
Usage
varSelection(x,y,method="addVars",yaiMethod="msn",imputeMethod="closest",
wts=NULL,nboot=20,trace=FALSE,
useParallel=if (.Platform$OS.type == "windows") FALSE else TRUE,...)
Arguments
x |
a set of X-Variables as used in |
y |
a set of Y-Variables as used in |
method |
if |
yaiMethod |
passed as |
imputeMethod |
passed as |
wts |
passed as argument |
nboot |
the number of bootstrap samples used at each variable selection step (see Details). When nboot is zero, NO bootstraping is done. |
trace |
if |
useParallel |
function |
... |
passed to |
Details
This function tracks the effect on generalized root mean square distance
(see grmsd
) when variables are added or deleted one at a time.
When adding variables, the function starts with none, and keeps the single
variable that provides the smallest grmsd
. When deleting variables,
the functions starts with all X-Variables and deletes them one at a
time such that those that remain provide the smallest
grmsd
. The function uses the following steps:
Function
yai
is run for all the Y-variables and candidate X-variable(s). The result is passed toimpute.yai
to get imputed values of Y-variables. That result is passed togrmsd
to compute a mean Mahalanobis distance for the case where the candidate variable is included (or deleted depending onmethod
). However, these steps are done once for each bootstrap replication and the resulting values are averaged to provide an average mean Mahalanobis distance over the bootstraps.Step one is done for each candidate X-variable forming a vector of
grmsd
values, one corresponding to the case where each candidate is added or deleted.When variables are being added (
method="addVars"
), the variable that is related to the smallestgrmsd
is kept. When variables are being deleted (method="delVars"
), the variable that is related to the largestgrmsd
is deleted.Once a variable has been added or deleted, the function proceeds to select another variable for selection or deletion by considering all remaining variables.
Value
An list of class varSel
with these tags:
call |
the call |
grmsd |
a 2-column matrix of the mean and std dev of the mean Mahalanobis distances associated with adding or removing the variables stored as the rownames. When nboot<2, the std dev are NA |
allgrmsd |
a list of the grmsd values that correspond to each bootstrap replication. The data in grmsd are based on these vectors of information. |
method |
the value of argument |
Author(s)
Nicholas L. Crookston ncrookston.fs@gmail.com
See Also
yai
, impute.yai
, bestVars
and
grmsd
Examples
data(iris)
set.seed(12345)
x <- iris[,1:2] # Sepal.Length Sepal.Width
y <- iris[,3:4] # Petal.Length Petal.Width
vsel <- varSelection(x=x,y=y,nboot=5,useParallel=FALSE)
vsel
bestVars(vsel)
plot(vsel)