R: Diagnostic plot for the number of iterations used in the...

chooseB {clusterMI}

R Documentation

Diagnostic plot for the number of iterations used in the varselbest function

Description

chooseB plots the proportion of times an explanatory variable is selected according to the number of iterations (B).

Usage

chooseB(
  res.varselbest,
  plotvar = NULL,
  linewidth = 1,
  linetype = "dotdash",
  xlab = "B",
  ylab = "Proportion",
  nrow = 2,
  ncol = 2,
  graph = TRUE
)

Arguments

`res.varselbest`	an output from the varselbest function
`plotvar`	index of variables for which a curve is ploted
`linewidth`	a numerical value setting the widths of lines
`linetype`	what type of plot should be drawn
`xlab`	a title for the x axis
`ylab`	a title for the y axis
`nrow`	argument of gtable. Default value is 2.
`ncol`	argument of gtable. Default value is 2.
`graph`	a boolean. If FALSE, no graphics are ploted. Default value is TRUE

Details

varselbest performs variable selection on random subsets of variables and, then, combines them to recover which explanatory variables are related to the response, following Bar-Hen and Audigier (2022) <doi:10.1080/00949655.2022.2070621>. More precisely, the outline of the algorithm are as follows: let consider a random subset of sizeblock among p variables. Then, any selection variables scheme can be applied. By resampling B times, a sample of size sizeblock among the p variables, we may count how many times a variable is considered as significantly related to the response and how many times it is not. The number of iterations B should be large so that the proportion of times a variable is selected becomes stable. chooseB plots the values of proportion according to the number of iterations.

Value

a list of matrices where each row corresponds to the vector of proportions (for all explanatory variables) obtained for a given value of B

References

Bar-Hen, A. and Audigier, V., An ensemble learning method for variable selection: application to high dimensional data and missing values, Journal of Statistical Computation and Simulation, <doi:10.1080/00949655.2022.2070621>, 2022.

Examples

data(wine)

require(parallel)
ref <- wine$cult
nb.clust <- 3
wine.na<-wine
wine.na$cult <- NULL
wine.na <- prodna(wine.na)

nnodes <- 2 # Number of CPU cores for parallel computing
B <- 80 # Number of iterations for variable selection

# variable selection

res.varsel <- varselbest(data.na = wine.na,
                        listvar = "alco",
                        B = B,
                        nnodes = nnodes,
                        nb.clust = nb.clust,
                        graph = FALSE)
# convergence
res.chooseB <- chooseB(res.varsel)

[Package clusterMI version 1.2.1 Index]