varSelection {yaImpute}R Documentation

Select variables for imputation models

Description

Computes grmsd (generalized root mean square distance) as variables are added to (method="addVars") or removed from (method="delVars") an k-NN imputation model. When adding variables the function keeps variables that strengthen imputation and deletes that weaken the imputation the least. The measure of model strength is grmsd between imputed and observed Y-variables among the reference observations.

Usage

varSelection(x,y,method="addVars",yaiMethod="msn",imputeMethod="closest",
  wts=NULL,nboot=20,trace=FALSE,
  useParallel=if (.Platform$OS.type == "windows") FALSE else TRUE,...)

Arguments

x

a set of X-Variables as used in yai.

y

a set of Y-Variables as used in yai.

method

if addVars, the X-Variables are added and if delVars they are deleted (see details).

yaiMethod

passed as method to yai.

imputeMethod

passed as method to impute.yai.

wts

passed as argument wts to grmsd which is used to score the alternative varialbe sets.

nboot

the number of bootstrap samples used at each variable selection step (see Details). When nboot is zero, NO bootstraping is done.

trace

if TRUE information at each step is output.

useParallel

function link{parallel:mclapply} from parallel will be used if it is available for running the bootstraps. It it is not available, link{lapply} is used (which is the only option on windows).

...

passed to link{yai}

Details

This function tracks the effect on generalized root mean square distance (see grmsd) when variables are added or deleted one at a time. When adding variables, the function starts with none, and keeps the single variable that provides the smallest grmsd. When deleting variables, the functions starts with all X-Variables and deletes them one at a time such that those that remain provide the smallest grmsd. The function uses the following steps:

  1. Function yai is run for all the Y-variables and candidate X-variable(s). The result is passed to impute.yai to get imputed values of Y-variables. That result is passed to grmsd to compute a mean Mahalanobis distance for the case where the candidate variable is included (or deleted depending on method). However, these steps are done once for each bootstrap replication and the resulting values are averaged to provide an average mean Mahalanobis distance over the bootstraps.

  2. Step one is done for each candidate X-variable forming a vector of grmsd values, one corresponding to the case where each candidate is added or deleted.

  3. When variables are being added (method="addVars"), the variable that is related to the smallest grmsd is kept. When variables are being deleted (method="delVars"), the variable that is related to the largest grmsd is deleted.

  4. Once a variable has been added or deleted, the function proceeds to select another variable for selection or deletion by considering all remaining variables.

Value

An list of class varSel with these tags:

call

the call

grmsd

a 2-column matrix of the mean and std dev of the mean Mahalanobis distances associated with adding or removing the variables stored as the rownames. When nboot<2, the std dev are NA

allgrmsd

a list of the grmsd values that correspond to each bootstrap replication. The data in grmsd are based on these vectors of information.

method

the value of argument method.

Author(s)

Nicholas L. Crookston ncrookston.fs@gmail.com

See Also

yai, impute.yai, bestVars and grmsd

Examples

data(iris)

set.seed(12345)

x <- iris[,1:2]  # Sepal.Length Sepal.Width 
y <- iris[,3:4]  # Petal.Length Petal.Width 

vsel <- varSelection(x=x,y=y,nboot=5,useParallel=FALSE)
vsel

bestVars(vsel)

plot(vsel)


[Package yaImpute version 1.0-34 Index]