longitudinal_consensus_cluster {longmixr}R Documentation

Longitudinal consensus clustering with flexmix

Description

This function performs longitudinal clustering with flexmix. To get robust results, the data is subsampled and the clustering is performed on this subsample. The results are combined in a consensus matrix and a final hierarchical clustering step performed on this matrix. In this, it follows the approach from the ConsensusClusterPlus package.

Usage

longitudinal_consensus_cluster(
  data = NULL,
  id_column = NULL,
  max_k = 3,
  reps = 10,
  p_item = 0.8,
  model_list = NULL,
  flexmix_formula = as.formula("~s(visit, k = 4) | patient_id"),
  title = "untitled_consensus_cluster",
  final_linkage = c("average", "ward.D", "ward.D2", "single", "complete", "mcquitty",
    "median", "centroid"),
  seed = 3794,
  verbose = FALSE
)

Arguments

data

a data.frame with one or several observations per subject. It needs to contain one column that specifies to which subject the entry (row) belongs to. This ID column is specified in id_column. Otherwise, there are no restrictions on the column names, as the model is specified in flexmix_formula.

id_column

name (character vector) of the ID column in data to identify all observations of one subject

max_k

maximum number of clusters, default is 3

reps

number of repetitions, default is 10

p_item

fraction of samples contained in subsampled sample, default is 0.8

model_list

either one flexmix driver or a list of flexmix drivers of class FLXMR

flexmix_formula

a formula object that describes the flexmix model relative to the formula in the flexmix drivers (the dot in the flexmix drivers is replaced, see the example). That means that you usually only specify the right-hand side of the formula here. However, this is not enforced or checked to give you more flexibility over the flexmix interface

title

name of the clustering; used if writeTable = TRUE

final_linkage

linkage used for the last hierarchical clustering step on the consensus matrix; has to be average, ward.D, ward.D2, single, complete, mcquitty, median or centroid. The default is average

seed

seed for reproducibility

verbose

boolean if status messages should be displayed. Default is FALSE

Details

The data types longitudinal_consensus_cluster can handle depends on how the flexmix models are set up, in principle all data types are supported for which there is a flexmix driver with the desired outcome variable.

If you follow the dimension reduction approach outlined in vignette("Example clustering analysis", package = "longmixr"), the input data types depend on what FAMD from the FactoMineR package can handle. FAMD accepts numeric variables and treats all other variables as factor variables which it can handle as well.

Value

An object (list) of class lcc with length maxk. The first entry general_information contains the entries:

consensus_matrices a list of all consensus matrices (for all specified clusters)
cluster_assignments a data.frame with an ID column named after id_column and a column for every specified number of clusters, e.g. assignment_num_clus_2
call the call/all arguments how longitudinal_consensus_cluster was called

The other entries correspond to the number of specified clusters (e.g. the second entry corresponds to 2 specified clusters) and each contains a list with the following entries:

consensus_matrix the consensus matrix
consensus_tree the result of the hierarchical clustering on the consensus matrix
consensus_class the resulting class for every observation
found_flexmix_clusters a vector of the actual found number of clusters by flexmix (which can deviate from the specified number)

Examples

set.seed(5)
test_data <- data.frame(patient_id = rep(1:10, each = 4),
visit = rep(1:4, 10),
var_1 = c(rnorm(20, -1), rnorm(20, 3)) +
rep(seq(from = 0, to = 1.5, length.out = 4), 10),
var_2 = c(rnorm(20, 0.5, 1.5), rnorm(20, -2, 0.3)) +
rep(seq(from = 1.5, to = 0, length.out = 4), 10))
model_list <- list(flexmix::FLXMRmgcv(as.formula("var_1 ~ .")),
flexmix::FLXMRmgcv(as.formula("var_2 ~ .")))
clustering <- longitudinal_consensus_cluster(
data = test_data,
id_column = "patient_id",
max_k = 2,
reps = 3,
model_list = model_list,
flexmix_formula = as.formula("~s(visit, k = 4) | patient_id"))
# not run
# plot(clustering)
# end not run

[Package longmixr version 1.0.0 Index]