mlcc.bic {varclust}R Documentation

Multiple Latent Components Clustering - Subspace clustering with automatic estimation of number of clusters and their dimension

Description

This function is an implementation of Multiple Latent Components Clustering (MLCC) algorithm which clusteres quantitative variables into a number, chosen using mBIC, of groups. For each considered number of clusters in numb.clusters mlcc.reps function is called. It invokes K-means based algorithm (mlcc.kmeans) finding local minimum of mBIC, which is run a given number of times (numb.runs) with different initializations. The best partition is choosen with mBIC (see mlcc.reps function).

Usage

mlcc.bic(X, numb.clusters = 1:10, numb.runs = 30, stop.criterion = 1,
  max.iter = 30, max.dim = 4, scale = TRUE, numb.cores = NULL,
  greedy = TRUE, estimate.dimensions = TRUE, verbose = FALSE,
  flat.prior = FALSE, show.warnings = FALSE)

Arguments

X

A data frame or a matrix with only continuous variables.

numb.clusters

A vector, numbers of clusters to be checked.

numb.runs

An integer, number of runs (initializations) of mlcc.kmeans.

stop.criterion

An integer, if an iteration of mlcc.kmeans algorithm makes less changes in partitions than stop.criterion, mlcc.kmeans stops.

max.iter

An integer, maximum number of iterations of the loop in mlcc.kmeans algorithm.

max.dim

An integer, if estimate.dimensions is FALSE then max.dim is dimension of each subspace. If estimate.dimensions is TRUE then subspaces dimensions are estimated from the range [1, max.dim].

scale

A boolean, if TRUE (value set by default) then variables in dataset are scaled to zero mean and unit variance.

numb.cores

An integer, number of cores to be used, by default all cores are used.

greedy

A boolean, if TRUE (value set by default) the clusters are estimated in a greedy way - first local minimum of mBIC is chosen.

estimate.dimensions

A boolean, if TRUE (value set by default) subspaces dimensions are estimated.

verbose

A boolean, if TRUE plot with mBIC values for different numbers of clusters is produced and values of mBIC, computed for every number of clusters and subspaces dimensions, are printed (value set by default is FALSE).

flat.prior

A boolean, if TRUE then, instead of an informative prior that takes into account number of models for a given number of clusters, flat prior is used.

show.warnings

A boolean, if set to TRUE all warnings are displayed, default value is FALSE.

Value

An object of class mlcc.fit consisting of

segmentation

a vector containing the partition of the variables

BIC

numeric, value of mBIC

subspacesDimensions

a list containing dimensions of the subspaces

nClusters

an integer, estimated number of clusters

factors

a list of matrices, basis for each subspace

all.fit

a list of segmentation, mBIC, subspaces dimension for all numbers of clusters considered for an estimated subspace dimensions

all.fit.dims

a list of lists of segmentation, mBIC, subspaces dimension for all numbers of clusters and subspaces dimensions considered

Examples


sim.data <- data.simulation(n = 50, SNR = 1, K = 3, numb.vars = 50, max.dim = 3)
mlcc.res <- mlcc.bic(sim.data$X, numb.clusters = 1:5, numb.runs = 20, numb.cores = 1, verbose=TRUE)
show.clusters(sim.data$X, mlcc.res$segmentation)


[Package varclust version 0.9.4 Index]