clustra {clustra} | R Documentation |
Cluster longitudinal trajectories over time
Description
The usual top level function for clustering longitudinal trajectories. After
initial setup, it calls trajectories
to perform k-means
clustering on continuous response
measured over time
, where each mean
is defined by a thin plate spline fit to all points in a cluster. See
clustra_vignette.Rmd
for examples of use.
Usage
clustra(
data,
k,
starts = "random",
maxdf = 30,
conv = c(10, 0),
mccores = 1,
verbose = FALSE,
...
)
Arguments
data |
Data frame or, preferably, also a data.table with response measurements, one response per observation. Required variables are (id, time, response). Other variables are ignored. |
k |
Number of clusters |
starts |
One of c("random", "distant") or an integer vector with values 1:k corresponding to unique ids of starting cluster assignments. For "random", starting clusters are assigned at random. For "distant", a FastMap-like algorithm selects k distant ids to which TPS models are fit and used as starting cluster centers to which ids are classified. Only id with more than median number of time points are used. Distance from an id to a TPS model is median absolute difference at id time points. Starting with a random id, distant ids are selected sequentially as the id with the largest minimum absolute distance to previous selections (a maximin concept). The first random selection is discarded and the next k selected ids are kept. Their TPS fits become the first cluster centers to which all ids are classified. See comments in code and DOI: 10.1109/TPAMI.2005.164 for the FastMap analogy. |
maxdf |
Fitting parameters. See |
conv |
Fitting parameters. See |
mccores |
See |
verbose |
Logical to turn on more output during fit iterations. |
... |
Additional parameters of optional plotting under |
Value
A list returned by trajectories
plus one more element ido
,
giving the original id numbers is invisibly returned. Invisible returns are
useful for repeated runs that explore verbose clustra output.
Examples
set.seed(13)
data = gen_traj_data(n_id = c(50, 100), types = c(1, 2),
intercepts = c(100, 80), m_obs = 20,
s_range = c(-365, -14), e_range = c(0.5*365, 2*365))
cl = clustra(data, k = 2, maxdf = 20, conv = c(5, 0), verbose = TRUE)
tabulate(data$group)
tabulate(data$true_group)