R: Classify the Longitudinal Data Based on the Selected...

Step3Clusters {traj}

R Documentation

Classify the Longitudinal Data Based on the Selected Measures.

Description

Classifies the trajectories by applying the k-means clustering algorithm to the measures selected by Step2Selection.

Usage

Step3Clusters(
  trajSelection,
  algorithm = "k-medoids",
  metric = "euclidean",
  nstart = 200,
  iter.max = 100,
  nclusters = NULL,
  criterion = "Calinski-Harabasz",
  K.max = min(15, nrow(trajSelection$selection) - 1),
  boot = FALSE,
  R = 100,
  B = 500
)

## S3 method for class 'trajClusters'
print(x, ...)

## S3 method for class 'trajClusters'
summary(object, ...)

Arguments

`trajSelection`	object of class `trajSelection` as returned by `Step2Selection`.
`algorithm`	either `"k-medoids"` or `"k-means"`. Determines the clustering algorithm to use. Defaults to `"k-medoids"`.
`metric`	to be passed to the `metric` argument of `pam` if `"k-medoids"` is the chosen `algorithm`. Defaults to `"euclidean"`.
`nstart`	to be passed to the `nstart` argument of `kmeans` if `"k-means"` is the chosen `algorithm`. Defaults to `200`.
`iter.max`	to be passed to the `iter.max` argument of `kmeans` if `"k-means"` is the chosen `algorithm`. Defaults to `100`.
`nclusters`	either `NULL` or the desired number of clusters. If `NULL`, the number of clusters is determined using the criterion chosen in `criterion`. Defaults to `NULL`.
`criterion`	criterion to determine the optimal number of clusters if `nclusters` is `NULL`. Either `"GAP"` or `"Calinski-Harabasz"`. Defaults to `"Calinski-Harabasz"`.
`K.max`	maximum number of clusters to be considered if `nclusters` is set to `NULL`. Defaults to `15`.
`boot`	logical. If `TRUE`, and if `"Calinski-Harabasz"` is the chosen `criterion`, the optimal number of clusters will be the first mode of sampling distribution of the optimal number of clusters obtained by bootstrap. Defaults to `FALSE`.
`R`	the number of bootstrap replicate if `boot` is set to `TRUE`. Defaults to `100`.
`B`	to be passed to the `B` argument of `clusGap` if `"GAP"` is the chosen `criterion`.
`x`	object of class `trajClusters`.
`...`	further arguments passed to or from other methods.
`object`	object of class `trajClusters`.

Details

If "GAP" is the chosen criterion for determining the optimal number of clusters, the method described by Tibshirani et al. is implemented by the clusGap function.

Instead, if "Calinski-Harabasz" is the chosen criterion, the Calinski-Harabasz index is computed for each possible number of clusters between 2 and K.max and the optimal number of clusters is the maximizer of the Calinski-Harabasz index. Moreover, if boot is set to TRUE, then, following the guidelines suggested by Mesidor et al., a sampling distribution of the optimal number of clusters is obtained by bootstrap and the optimal number of clusters is chosen to be the (first) mode of this sampling distribution.

Value

An object of class trajClusters; a list containing the result of the clustering, as well as a curated form of the arguments.

References

Miceline Mésidor, Caroline Sirois, Marc Simard, Denis Talbot, A Bootstrap Approach for Evaluating Uncertainty in the Number of Groups Identified by Latent Class Growth Models, American Journal of Epidemiology, Volume 192, Issue 11, November 2023, Pages 1896–1903, https://doi.org/10.1093/aje/kwad148

Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of data clusters via the Gap statistic. Journal of the Royal Statistical Society B, 63, 411–423.

Tibshirani, R., Walther, G. and Hastie, T. (2000). Estimating the number of clusters in a dataset via the Gap statistic. Technical Report. Stanford.

Examples

## Not run: 
data("trajdata")
trajdata.noGrp <- trajdata[, -which(colnames(trajdata) == "Group")] #remove the Group column

m = Step1Measures(trajdata.noGrp, ID = TRUE, measures = 1:18)
s = Step2Selection(m)

s$RC$loadings

s2 = Step2Selection(m, select = c(10, 12, 8, 4))

c3.part <- Step3Clusters(s2, nclusters = 3)$partition
c4.part <- Step3Clusters(s2, nclusters = 4)$partition
c5.part <- Step3Clusters(s2, nclusters = 5)$partition


## End(Not run)

[Package traj version 2.2.0 Index]