chooseB {clusterMI} | R Documentation |
Diagnostic plot for the number of iterations used in the varselbest function
Description
chooseB
plots the proportion of times an explanatory variable is selected according to the number of iterations (B).
Usage
chooseB(
res.varselbest,
plotvar = NULL,
linewidth = 1,
linetype = "dotdash",
xlab = "B",
ylab = "Proportion",
nrow = 2,
ncol = 2,
graph = TRUE
)
Arguments
res.varselbest |
an output from the varselbest function |
plotvar |
index of variables for which a curve is ploted |
linewidth |
a numerical value setting the widths of lines |
linetype |
what type of plot should be drawn |
xlab |
a title for the x axis |
ylab |
a title for the y axis |
nrow |
argument of gtable. Default value is 2. |
ncol |
argument of gtable. Default value is 2. |
graph |
a boolean. If FALSE, no graphics are ploted. Default value is TRUE |
Details
varselbest
performs variable selection on random subsets of variables and, then, combines them to recover which explanatory variables are related to the response, following Bar-Hen and Audigier (2022) <doi:10.1080/00949655.2022.2070621>.
More precisely, the outline of the algorithm are as follows: let consider a random subset of sizeblock
among p variables.
Then, any selection variables scheme can be applied.
By resampling B
times, a sample of size sizeblock
among the p variables, we may count how many times a variable is considered as significantly related to the response and how many times it is not.
The number of iterations B
should be large so that the proportion of times a variable is selected becomes stable. chooseB
plots the values of proportion according to the number of iterations.
Value
a list of matrices where each row corresponds to the vector of proportions (for all explanatory variables) obtained for a given value of B
References
Bar-Hen, A. and Audigier, V., An ensemble learning method for variable selection: application to high dimensional data and missing values, Journal of Statistical Computation and Simulation, <doi:10.1080/00949655.2022.2070621>, 2022.
See Also
Examples
data(wine)
require(parallel)
ref <- wine$cult
nb.clust <- 3
wine.na<-wine
wine.na$cult <- NULL
wine.na <- prodna(wine.na)
nnodes <- 2 # Number of CPU cores for parallel computing
B <- 80 # Number of iterations for variable selection
# variable selection
res.varsel <- varselbest(data.na = wine.na,
listvar = "alco",
B = B,
nnodes = nnodes,
nb.clust = nb.clust,
graph = FALSE)
# convergence
res.chooseB <- chooseB(res.varsel)