longitudinal_consensus_cluster {longmixr} | R Documentation |
Longitudinal consensus clustering with flexmix
Description
This function performs longitudinal clustering with flexmix. To get robust
results, the data is subsampled and the clustering is performed on this
subsample. The results are combined in a consensus matrix and a final
hierarchical clustering step performed on this matrix. In this, it follows
the approach from the ConsensusClusterPlus
package.
Usage
longitudinal_consensus_cluster(
data = NULL,
id_column = NULL,
max_k = 3,
reps = 10,
p_item = 0.8,
model_list = NULL,
flexmix_formula = as.formula("~s(visit, k = 4) | patient_id"),
title = "untitled_consensus_cluster",
final_linkage = c("average", "ward.D", "ward.D2", "single", "complete", "mcquitty",
"median", "centroid"),
seed = 3794,
verbose = FALSE
)
Arguments
data |
a |
id_column |
name (character vector) of the ID column in |
max_k |
maximum number of clusters, default is |
reps |
number of repetitions, default is |
p_item |
fraction of samples contained in subsampled sample, default is
|
model_list |
either one |
flexmix_formula |
a |
title |
name of the clustering; used if |
final_linkage |
linkage used for the last hierarchical clustering step on
the consensus matrix; has to be |
seed |
seed for reproducibility |
verbose |
|
Details
The data types longitudinal_consensus_cluster
can handle depends on
how the flexmix
models are set up, in principle all data types are
supported for which there is a flexmix
driver with the desired
outcome variable.
If you follow the dimension reduction approach outlined in
vignette("Example clustering analysis", package = "longmixr")
, the
input data types depend on what FAMD
from the FactoMineR
package can handle. FAMD
accepts numeric
variables and treats
all other variables as factor
variables which it can handle as well.
Value
An object (list) of class lcc
with length maxk
.
The first entry general_information
contains the entries:
consensus_matrices | a list of all consensus matrices (for all specified clusters) |
cluster_assignments | a data.frame with an ID column named after id_column and a column for every specified number of clusters, e.g. assignment_num_clus_2 |
call | the call/all arguments how longitudinal_consensus_cluster was called
|
The other entries correspond to the number of specified clusters (e.g. the second entry corresponds to 2 specified clusters) and each contains a list with the following entries:
consensus_matrix | the consensus matrix |
consensus_tree | the result of the hierarchical clustering on the consensus matrix |
consensus_class | the resulting class for every observation |
found_flexmix_clusters | a vector of the actual found number of clusters by flexmix (which can deviate from the specified number)
|
Examples
set.seed(5)
test_data <- data.frame(patient_id = rep(1:10, each = 4),
visit = rep(1:4, 10),
var_1 = c(rnorm(20, -1), rnorm(20, 3)) +
rep(seq(from = 0, to = 1.5, length.out = 4), 10),
var_2 = c(rnorm(20, 0.5, 1.5), rnorm(20, -2, 0.3)) +
rep(seq(from = 1.5, to = 0, length.out = 4), 10))
model_list <- list(flexmix::FLXMRmgcv(as.formula("var_1 ~ .")),
flexmix::FLXMRmgcv(as.formula("var_2 ~ .")))
clustering <- longitudinal_consensus_cluster(
data = test_data,
id_column = "patient_id",
max_k = 2,
reps = 3,
model_list = model_list,
flexmix_formula = as.formula("~s(visit, k = 4) | patient_id"))
# not run
# plot(clustering)
# end not run