chooseB {clusterMI}R Documentation

Diagnostic plot for the number of iterations used in the varselbest function

Description

chooseB plots the proportion of times an explanatory variable is selected according to the number of iterations (B).

Usage

chooseB(
  res.varselbest,
  plotvar = NULL,
  linewidth = 1,
  linetype = "dotdash",
  xlab = "B",
  ylab = "Proportion",
  nrow = 2,
  ncol = 2,
  graph = TRUE
)

Arguments

res.varselbest

an output from the varselbest function

plotvar

index of variables for which a curve is ploted

linewidth

a numerical value setting the widths of lines

linetype

what type of plot should be drawn

xlab

a title for the x axis

ylab

a title for the y axis

nrow

argument of gtable. Default value is 2.

ncol

argument of gtable. Default value is 2.

graph

a boolean. If FALSE, no graphics are ploted. Default value is TRUE

Details

varselbest performs variable selection on random subsets of variables and, then, combines them to recover which explanatory variables are related to the response, following Bar-Hen and Audigier (2022) <doi:10.1080/00949655.2022.2070621>. More precisely, the outline of the algorithm are as follows: let consider a random subset of sizeblock among p variables. Then, any selection variables scheme can be applied. By resampling B times, a sample of size sizeblock among the p variables, we may count how many times a variable is considered as significantly related to the response and how many times it is not. The number of iterations B should be large so that the proportion of times a variable is selected becomes stable. chooseB plots the values of proportion according to the number of iterations.

Value

a list of matrices where each row corresponds to the vector of proportions (for all explanatory variables) obtained for a given value of B

References

Bar-Hen, A. and Audigier, V., An ensemble learning method for variable selection: application to high dimensional data and missing values, Journal of Statistical Computation and Simulation, <doi:10.1080/00949655.2022.2070621>, 2022.

See Also

varselbest

Examples

data(wine)

require(parallel)
ref <- wine$cult
nb.clust <- 3
wine.na<-wine
wine.na$cult <- NULL
wine.na <- prodna(wine.na)

nnodes <- 2 # Number of CPU cores for parallel computing
B <- 80 # Number of iterations for variable selection

# variable selection

res.varsel <- varselbest(data.na = wine.na,
                        listvar = "alco",
                        B = B,
                        nnodes = nnodes,
                        nb.clust = nb.clust,
                        graph = FALSE)
# convergence
res.chooseB <- chooseB(res.varsel)


[Package clusterMI version 1.2.1 Index]