Step3Clusters {traj}R Documentation

Classify the Longitudinal Data Based on the Selected Measures.

Description

Classifies the trajectories by applying the k-means clustering algorithm to the measures selected by Step2Selection.

Usage

Step3Clusters(
  trajSelection,
  algorithm = "k-medoids",
  metric = "euclidean",
  nstart = 200,
  iter.max = 100,
  nclusters = NULL,
  criterion = "Calinski-Harabasz",
  K.max = min(15, nrow(trajSelection$selection) - 1),
  boot = FALSE,
  R = 100,
  B = 500
)

## S3 method for class 'trajClusters'
print(x, ...)

## S3 method for class 'trajClusters'
summary(object, ...)

Arguments

trajSelection

object of class trajSelection as returned by Step2Selection.

algorithm

either "k-medoids" or "k-means". Determines the clustering algorithm to use. Defaults to "k-medoids".

metric

to be passed to the metric argument of pam if "k-medoids" is the chosen algorithm. Defaults to "euclidean".

nstart

to be passed to the nstart argument of kmeans if "k-means" is the chosen algorithm. Defaults to 200.

iter.max

to be passed to the iter.max argument of kmeans if "k-means" is the chosen algorithm. Defaults to 100.

nclusters

either NULL or the desired number of clusters. If NULL, the number of clusters is determined using the criterion chosen in criterion. Defaults to NULL.

criterion

criterion to determine the optimal number of clusters if nclusters is NULL. Either "GAP" or "Calinski-Harabasz". Defaults to "Calinski-Harabasz".

K.max

maximum number of clusters to be considered if nclusters is set to NULL. Defaults to 15.

boot

logical. If TRUE, and if "Calinski-Harabasz" is the chosen criterion, the optimal number of clusters will be the first mode of sampling distribution of the optimal number of clusters obtained by bootstrap. Defaults to FALSE.

R

the number of bootstrap replicate if boot is set to TRUE. Defaults to 100.

B

to be passed to the B argument of clusGap if "GAP" is the chosen criterion.

x

object of class trajClusters.

...

further arguments passed to or from other methods.

object

object of class trajClusters.

Details

If "GAP" is the chosen criterion for determining the optimal number of clusters, the method described by Tibshirani et al. is implemented by the clusGap function.

Instead, if "Calinski-Harabasz" is the chosen criterion, the Calinski-Harabasz index is computed for each possible number of clusters between 2 and K.max and the optimal number of clusters is the maximizer of the Calinski-Harabasz index. Moreover, if boot is set to TRUE, then, following the guidelines suggested by Mesidor et al., a sampling distribution of the optimal number of clusters is obtained by bootstrap and the optimal number of clusters is chosen to be the (first) mode of this sampling distribution.

Value

An object of class trajClusters; a list containing the result of the clustering, as well as a curated form of the arguments.

References

Miceline Mésidor, Caroline Sirois, Marc Simard, Denis Talbot, A Bootstrap Approach for Evaluating Uncertainty in the Number of Groups Identified by Latent Class Growth Models, American Journal of Epidemiology, Volume 192, Issue 11, November 2023, Pages 1896–1903, https://doi.org/10.1093/aje/kwad148

Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of data clusters via the Gap statistic. Journal of the Royal Statistical Society B, 63, 411–423.

Tibshirani, R., Walther, G. and Hastie, T. (2000). Estimating the number of clusters in a dataset via the Gap statistic. Technical Report. Stanford.

See Also

Step2Selection

Examples

## Not run: 
data("trajdata")
trajdata.noGrp <- trajdata[, -which(colnames(trajdata) == "Group")] #remove the Group column

m = Step1Measures(trajdata.noGrp, ID = TRUE, measures = 1:18)
s = Step2Selection(m)

s$RC$loadings

s2 = Step2Selection(m, select = c(10, 12, 8, 4))

c3.part <- Step3Clusters(s2, nclusters = 3)$partition
c4.part <- Step3Clusters(s2, nclusters = 4)$partition
c5.part <- Step3Clusters(s2, nclusters = 5)$partition


## End(Not run)


[Package traj version 2.2.0 Index]