R: Longitudinal consensus clustering with flexmix

longitudinal_consensus_cluster {longmixr}

R Documentation

Longitudinal consensus clustering with flexmix

Description

This function performs longitudinal clustering with flexmix. To get robust results, the data is subsampled and the clustering is performed on this subsample. The results are combined in a consensus matrix and a final hierarchical clustering step performed on this matrix. In this, it follows the approach from the ConsensusClusterPlus package.

Usage

longitudinal_consensus_cluster(
  data = NULL,
  id_column = NULL,
  max_k = 3,
  reps = 10,
  p_item = 0.8,
  model_list = NULL,
  flexmix_formula = as.formula("~s(visit, k = 4) | patient_id"),
  title = "untitled_consensus_cluster",
  final_linkage = c("average", "ward.D", "ward.D2", "single", "complete", "mcquitty",
    "median", "centroid"),
  seed = 3794,
  verbose = FALSE
)

Arguments

`data`	a `data.frame` with one or several observations per subject. It needs to contain one column that specifies to which subject the entry (row) belongs to. This ID column is specified in `id_column`. Otherwise, there are no restrictions on the column names, as the model is specified in `flexmix_formula`.
`id_column`	name (character vector) of the ID column in `data` to identify all observations of one subject
`max_k`	maximum number of clusters, default is `3`
`reps`	number of repetitions, default is `10`
`p_item`	fraction of samples contained in subsampled sample, default is `0.8`
`model_list`	either one `flexmix` driver or a list of `flexmix` drivers of class `FLXMR`
`flexmix_formula`	a `formula` object that describes the `flexmix` model relative to the formula in the flexmix drivers (the dot in the flexmix drivers is replaced, see the example). That means that you usually only specify the right-hand side of the formula here. However, this is not enforced or checked to give you more flexibility over the `flexmix` interface
`title`	name of the clustering; used if `writeTable = TRUE`
`final_linkage`	linkage used for the last hierarchical clustering step on the consensus matrix; has to be `average, ward.D, ward.D2, single, complete, mcquitty, median` or `centroid`. The default is `average`
`seed`	seed for reproducibility
`verbose`	`boolean` if status messages should be displayed. Default is `FALSE`

Details

The data types longitudinal_consensus_cluster can handle depends on how the flexmix models are set up, in principle all data types are supported for which there is a flexmix driver with the desired outcome variable.

If you follow the dimension reduction approach outlined in vignette("Example clustering analysis", package = "longmixr"), the input data types depend on what FAMD from the FactoMineR package can handle. FAMD accepts numeric variables and treats all other variables as factor variables which it can handle as well.

Value

An object (list) of class lcc with length maxk. The first entry general_information contains the entries:

`consensus_matrices`	a list of all consensus matrices (for all specified clusters)

`cluster_assignments`	a `data.frame` with an ID column named after `id_column` and a column for every specified number of clusters, e.g. `assignment_num_clus_2`

`call`	the call/all arguments how `longitudinal_consensus_cluster` was called

The other entries correspond to the number of specified clusters (e.g. the second entry corresponds to 2 specified clusters) and each contains a list with the following entries:

`consensus_matrix`	the consensus matrix

`consensus_tree`	the result of the hierarchical clustering on the consensus matrix

`consensus_class`	the resulting class for every observation

`found_flexmix_clusters`	a vector of the actual found number of clusters by `flexmix` (which can deviate from the specified number)

Examples

set.seed(5)
test_data <- data.frame(patient_id = rep(1:10, each = 4),
visit = rep(1:4, 10),
var_1 = c(rnorm(20, -1), rnorm(20, 3)) +
rep(seq(from = 0, to = 1.5, length.out = 4), 10),
var_2 = c(rnorm(20, 0.5, 1.5), rnorm(20, -2, 0.3)) +
rep(seq(from = 1.5, to = 0, length.out = 4), 10))
model_list <- list(flexmix::FLXMRmgcv(as.formula("var_1 ~ .")),
flexmix::FLXMRmgcv(as.formula("var_2 ~ .")))
clustering <- longitudinal_consensus_cluster(
data = test_data,
id_column = "patient_id",
max_k = 2,
reps = 3,
model_list = model_list,
flexmix_formula = as.formula("~s(visit, k = 4) | patient_id"))
# not run
# plot(clustering)
# end not run

[Package longmixr version 1.0.0 Index]