seMIsupcox {doMIsaul}R Documentation

Semisupervised learning for a right censored endpoint

Description

MultiCons consensus based method for MI-Semisupervised clustering. The final partition is a consensus of the Pareto-optimal solutions.

Usage

seMIsupcox(
  Impute = FALSE,
  Impute.m = 5,
  center.init = TRUE,
  center.init.N = 500,
  center.init.Ks = 2:7,
  X,
  CVE.fun = "LP",
  Y,
  nfolds = 10,
  save.path = NULL,
  Unsup.Sup.relImp = list(relImp.55 = c(0.5, 0.5)),
  plot.cons = FALSE,
  cleanup.partition = TRUE,
  min.cluster.size = 10,
  level.order = NULL,
  Unclassified = "Unclassified",
  return.detail = FALSE
)

Arguments

Impute

Boolean. Default is FALSE to indicate that the user performed the imputation and provides the imputed data. If TRUE, the imputation will be performed within the call using the MImpute_surv() function. Note that if Impute is TRUE, center.init is also forced to TRUE as the center coordinates may depend on the imputation.

Impute.m

Used only if Impute is TRUE; number of imputations to perform

center.init

Either a User supplied List of dataframe containing the cluster centers coordinates (for example as obtained with initiate_centers(), Or TRUE to initiate the centers within the call of the function (performed with initiate_centers()). Note that if TRUE a random initialization will be performed. For a finer tuning of the center initialization the user should generate and provide the list of centers coordinates.

center.init.N

Used only if center.init is TRUE. The number to initialization to produce. Default to 500.

center.init.Ks

Used only if center.init is TRUE. Vector of number of clusters to generate for the initialization. Default to 2 to 7 clusters.

X

Data, in the form of a list of data.frame(s). The list should be one length 1 if data are complete or if Impute is TRUE, of should be a list of imputed dataframes if data are incomplete. If columns named "time" and "status" are present they will be discarded for the clustering.

CVE.fun

string indicating how to calculate the cross validation error : only LP is available and stands for linear predictor approach (using the 'ncvreg' package).

Y

Passed to CVE.fun, Outcome data: should be dataframe or matrix with 2 columns: "time" and "status".

nfolds

Number of folds for cross-validation.

save.path

Path indicating where objectives values for each iteration should be saved. If null the values are not saved.

Unsup.Sup.relImp

List of weights for the unsupervised and supervised objectives for the Pareto optimal solution. Default is to use only one set of weights : same weight.

plot.cons

Logical. Should the consensus tree be plotted?

cleanup.partition

should the partition be trimmed of small clusters. (The consensus may generate small clusters of observations for which there is no consensus on the cluster assignation)

min.cluster.size

if cleanup.partition == TRUE: Minimum cluster size (i.e., smaller clusters will be discarded)

level.order

if cleanup.partition == TRUE: optional. If you supply a variable the cluster levels will be ordinated according to the mean values for the variable

Unclassified

if cleanup.partition == TRUE string for the label of the unclassified observations. defaults value is NA.

return.detail

logical. Should the detail of imputation specific partition be returned, in supplement to the final consensus partition?

Value

A vector containing the final cluster IDs. Or if return.detail == TRUE, a list containing Consensus: the final cluster ID, Detail: the clusters obtained for each imputed dataset, Imputed.data a list containing the imputed datasets.

Examples

data(cancer, package = "survival")
cancer$status <- cancer$status - 1
cancer <- cancer[, -1]
### With imputation included
res <- seMIsupcox(X = list(cancer), Y = cancer[, c("time", "status")],
                  Impute = TRUE, Impute.m = 3, center.init = TRUE,
                  nfolds = 10, center.init.N = 20)

### With imputation and center initialization not included
## 1 imputation
cancer.imp <- MImpute_surv(cancer, 3)

## 2 Center initialization
# A low N value is used for example purposes. Higher values should be used.
N <- 20
center.number <- sample(2:6, size = N, replace = TRUE)
the.seeds <- runif(N) * 10^9
sel.col <- which(!colnames(cancer) %in% c("time", "status"))
inits <- sapply(1:length(cancer.imp), function(mi.i) {
 initiate_centers(data = cancer.imp[[mi.i]][, sel.col],
                  N = N, t = 1, k = center.number,
                  seeds.N = the.seeds)},
                USE.NAMES = TRUE, simplify = FALSE)

## 3 learning

res1 <- seMIsupcox(X = cancer.imp, Y = cancer[, c("time", "status")],
                   Impute = FALSE, center.init = inits, nfolds = 10,
                   cleanup.partition = FALSE)
res2 <- seMIsupcox(X = cancer.imp, Y = cancer[, c("time", "status")],
                  center.init = inits, nfolds = 10)


[Package doMIsaul version 1.0.1 Index]