backward.explorer {ClustMMDD} | R Documentation |
This function gathers a set of the most competitive models using a backward-stepwise strategy. The visited models are gathered in a file with suffix "_ExploredModels.txt". The algorithm used is described in Wilson Toussile and Elisabeth Gassiat (2009).
backward.explorer(x, Kmax, Criterion, ploidy = 1, ForceExclusion = FALSE, emOptions = list(epsi = NULL, nberSmallEM = NULL, nberIterations = NULL, nberMaxIterations = NULL, typeSmallEM = NULL, typeEM = NULL, putThreshold = NULL), Kmin = 1, Smin = NULL, project = deparse(substitute(x)))
x |
A matrix of string that contains data. |
Kmax |
The maximum number of clusters to be explored. |
Criterion |
The model selection criterion in c("BIC", "AIC", "ICL", "CteDim") used for exploration (see details). |
ploidy |
The number of columns for each variable in the data. For example, ploidy = 2 for genotypic data from diploid individual. |
ForceExclusion |
The indication of whether to force exclusion or not. The default value is set to FALSE. |
emOptions |
A list of EM options (see |
Kmin |
The minimum number of clusters. The default value is set to 1. |
Smin |
A logical vector that indicates the variables to include in the selected set of clustering variables. The default value NULL: no variable is preselected. |
project |
The name of the project. The default value is the name of the dataset. |
If the penalized criteria is CteDim
, a sequence of penalty functions of the form
pen≤ft(K,S\right)=λ*dim≤ft(K,S\right) is used. In this shape of penalty function,
λ is in [0.5, log(N)], where N is the number of individuals in the sample
data. Thus, AIC
and BIC
penalties are in the sequence of candidate penalties.
A data.frame of selected models for the choosen proposed criteria.
Wilson Toussile
Dominique Bontemps and Wilson Toussile (2013) : Clustering and variable selection for categorical multivariate data. Electronic Journal of Statistics, Volume 7, 2344-2371, ISSN.
Wilson Toussile and Elisabeth Gassiat (2009) : Variable selection in model-based clustering using multilocus genotype data. Adv Data Anal Classif, Vol 3, number 2, 109-134.
dimJump.R
for the data driven calibration of the penalty function, and
model.selection.R
for the final model selection.
data(genotype1) head(genotype1) genotype2 = cutEachCol(genotype1[, -11], ploidy = 2) head(genotype2) # The following command create a file "genotype2_ExploredModels.txt" # that contains the most competitive models. #output = backward.explorer(genotype2, Kmax = 10, ploidy = 2, Kmin = 1, Criterion = "CteDim") data(genotype2_ExploredModels) head(genotype2_ExploredModels)