seMIsupcox {doMIsaul}R Documentation

Semisupervised learning for a right censored endpoint


MultiCons consensus based method for MI-Semisupervised clustering. The final partition is a consensus of the Pareto-optimal solutions.


  Impute = FALSE,
  Impute.m = 5,
  center.init = TRUE,
  center.init.N = 500,
  center.init.Ks = 2:7,
  X, = "LP",
  nfolds = 10,
  save.path = NULL,
  Unsup.Sup.relImp = list(relImp.55 = c(0.5, 0.5)),
  plot.cons = FALSE,
  cleanup.partition = TRUE,
  min.cluster.size = 10,
  level.order = NULL,
  Unclassified = "Unclassified",
  return.detail = FALSE



Boolean. Default is FALSE to indicate that the user performed the imputation and provides the imputed data. If TRUE, the imputation will be performed within the call using the MImpute_surv() function. Note that if Impute is TRUE, center.init is also forced to TRUE as the center coordinates may depend on the imputation.


Used only if Impute is TRUE; number of imputations to perform


Either a User supplied List of dataframe containing the cluster centers coordinates (for example as obtained with initiate_centers(), Or TRUE to initiate the centers within the call of the function (performed with initiate_centers()). Note that if TRUE a random initialization will be performed. For a finer tuning of the center initialization the user should generate and provide the list of centers coordinates.


Used only if center.init is TRUE. The number to initialization to produce. Default to 500.


Used only if center.init is TRUE. Vector of number of clusters to generate for the initialization. Default to 2 to 7 clusters.


Data, in the form of a list of data.frame(s). The list should be one length 1 if data are complete or if Impute is TRUE, of should be a list of imputed dataframes if data are incomplete. If columns named "time" and "status" are present they will be discarded for the clustering.

string indicating how to calculate the cross validation error : only LP is available and stands for linear predictor approach (using the 'ncvreg' package).


Passed to, Outcome data: should be dataframe or matrix with 2 columns: "time" and "status".


Number of folds for cross-validation.


Path indicating where objectives values for each iteration should be saved. If null the values are not saved.


List of weights for the unsupervised and supervised objectives for the Pareto optimal solution. Default is to use only one set of weights : same weight.


Logical. Should the consensus tree be plotted?


should the partition be trimmed of small clusters. (The consensus may generate small clusters of observations for which there is no consensus on the cluster assignation)


if cleanup.partition == TRUE: Minimum cluster size (i.e., smaller clusters will be discarded)


if cleanup.partition == TRUE: optional. If you supply a variable the cluster levels will be ordinated according to the mean values for the variable


if cleanup.partition == TRUE string for the label of the unclassified observations. defaults value is NA.


logical. Should the detail of imputation specific partition be returned, in supplement to the final consensus partition?


A vector containing the final cluster IDs. Or if return.detail == TRUE, a list containing Consensus: the final cluster ID, Detail: the clusters obtained for each imputed dataset, a list containing the imputed datasets.


data(cancer, package = "survival")
cancer$status <- cancer$status - 1
cancer <- cancer[, -1]
### With imputation included
res <- seMIsupcox(X = list(cancer), Y = cancer[, c("time", "status")],
                  Impute = TRUE, Impute.m = 3, center.init = TRUE,
                  nfolds = 10, center.init.N = 20)

### With imputation and center initialization not included
## 1 imputation
cancer.imp <- MImpute_surv(cancer, 3)

## 2 Center initialization
# A low N value is used for example purposes. Higher values should be used.
N <- 20
center.number <- sample(2:6, size = N, replace = TRUE)
the.seeds <- runif(N) * 10^9
sel.col <- which(!colnames(cancer) %in% c("time", "status"))
inits <- sapply(1:length(cancer.imp), function(mi.i) {
 initiate_centers(data = cancer.imp[[mi.i]][, sel.col],
                  N = N, t = 1, k = center.number,
                  seeds.N = the.seeds)},
                USE.NAMES = TRUE, simplify = FALSE)

## 3 learning

res1 <- seMIsupcox(X = cancer.imp, Y = cancer[, c("time", "status")],
                   Impute = FALSE, center.init = inits, nfolds = 10,
                   cleanup.partition = FALSE)
res2 <- seMIsupcox(X = cancer.imp, Y = cancer[, c("time", "status")],
                  center.init = inits, nfolds = 10)

[Package doMIsaul version 1.0.1 Index]