unsupMI {doMIsaul}R Documentation

Unsupervised learning for incomplete dataset

Description

Unsupervised clustering for multiply imputed datasets using MultiCons() consensus (Faucheux et al. 2021 procedure)

Usage

unsupMI(
  Impute = FALSE,
  Impute.m = 5,
  cens.data.lod = NULL,
  cens.standards = NULL,
  cens.mice.log = 10,
  censsurv.var.log = NULL,
  censsurv.maxit = 10,
  data,
  log.data = FALSE,
  algo = "km",
  k.crit = "ch",
  comb.cons = FALSE,
  plot.cons = FALSE,
  return.detail = FALSE,
  not.to.use = c("time", "status"),
  cleanup.partition = TRUE,
  min.cluster.size = 10,
  level.order = NULL,
  Unclassified = "Unclassified"
)

Arguments

Impute

Default is FALSE to indicate that the user performed the imputation and provides the imputed data. Otherwise string ("MImpute", "MImpute_surv" or "MImpute_lcens") to perform the imputation within the call using the MImpute(), MImpute_surv() or MImpute_lcens() function.

Impute.m

Used only if Impute is not FALSE ; number of imputations to perform

cens.data.lod

passed to MImpute_lcens() if Impute == MImpute_lcens

cens.standards

passed to MImpute_lcens() if Impute == MImpute_lcens

cens.mice.log

passed to MImpute_lcens() if Impute == MImpute_lcens

censsurv.var.log

for MImpute_lcenssurv imputation: names of variables to log if mice.log is numeric. If NULL, all variables but those intime.status.names will be logged.

censsurv.maxit

for MImpute_lcenssurv imputation: passed to mice().

data

Data, in the form of a list of data.frame(s). The list should be one length 1 if data are complete or if Impute is not FALSE, it should be a list of imputed dataframes if data are incomplete and imputed. If some columns are in not.to.use, they will be discarded for the clustering.

log.data

logical. Should all columns of the dataset be logged before applying clustering algorithms?

algo

vector of strings: name of clustering algorithms to use (use "km" for k-means, "kmed" for K-medians, "hc" for hclust() and/or "mclust" for mclust()).

k.crit

string. Criterion to select the optimal number of clusters (for each imputed dataset). Use "ch" for Calinski and Harabasz criterion (not available for mclust), "CritCF" for CritCF or "bic" for BIC (mclust only).

comb.cons

logical. Forced to FALSE if length(algo)<2. Use TRUE to perform an additional consensus from all partitions generates, whatever the algorithm.

plot.cons

logical. Use TRUE to print the MultiCons tree. Note that if all partitions are identical across the imputation no consensus will be performed and therefore not plot will be obtained even if plot.cons = TRUE.

return.detail

logical. Should the detail of imputation specific partition and the imputed data be returned, in the supplement to the final consensus partition?

not.to.use

vector of strings : names of the columns that should be discarded for the learning step.

cleanup.partition

should the partition be trimmed of small clusters. (The consensus may generate small clusters of observations for which there is no consensus on the cluster assignation)

min.cluster.size

if cleanup.partition == TRUE: Minimum cluster size (i.e., smaller clusters will be discarded)

level.order

if cleanup.partition == TRUE: optional. If you supply a variable the cluster levels will be ordinated according to the mean values for the variable

Unclassified

if cleanup.partition == TRUE string for the label of the unclassified observations. defaults value is NA.

Value

if length(algo)>1 a vector of final cluster ID ; if length(algo)>1 a data.frame with each column being the final cluster ID for the corresponding algorithm. Or if return.detail == TRUE, a list containing Consensus : the final cluster ID (or data.frame), Detail: the clusters obtained for each imputed dataset, Imputed.data a list containing the imputed datasets.

Examples

### With imputation included
data(cancer, package = "survival")
cancer$status <- cancer$status - 1
res.0 <- unsupMI(data = list(cancer), Impute = "MImpute_surv",
                 cleanup.partition = FALSE)

### With imputation not included
## 1 imputation
cancer.imp <- MImpute_surv(cancer, 3)
## 2 learning
res <- unsupMI(data = cancer.imp, cleanup.partition = FALSE)
summary(factor(res))
res.1 <- unsupMI(data = cancer.imp)
summary(factor(res.1))

## 2.bis learning with several algorithms
res.2 <- unsupMI(data = cancer.imp, algo = c("km", "hc"), comb.cons = TRUE,
                 plot.cons = TRUE)

[Package doMIsaul version 1.0.1 Index]