clustra {clustra}R Documentation

Cluster longitudinal trajectories over time

Description

The usual top level function for clustering longitudinal trajectories. After initial setup, it calls trajectories to perform k-means clustering on continuous response measured over time, where each mean is defined by a thin plate spline fit to all points in a cluster. See clustra_vignette.Rmd for examples of use.

Usage

clustra(
  data,
  k,
  starts = "random",
  maxdf = 30,
  conv = c(10, 0),
  mccores = 1,
  verbose = FALSE,
  ...
)

Arguments

data

Data frame or, preferably, also a data.table with response measurements, one response per observation. Required variables are (id, time, response). Other variables are ignored.

k

Number of clusters

starts

One of c("random", "distant") or an integer vector with values 1:k corresponding to unique ids of starting cluster assignments. For "random", starting clusters are assigned at random. For "distant", a FastMap-like algorithm selects k distant ids to which TPS models are fit and used as starting cluster centers to which ids are classified. Only id with more than median number of time points are used. Distance from an id to a TPS model is median absolute difference at id time points. Starting with a random id, distant ids are selected sequentially as the id with the largest minimum absolute distance to previous selections (a maximin concept). The first random selection is discarded and the next k selected ids are kept. Their TPS fits become the first cluster centers to which all ids are classified. See comments in code and DOI: 10.1109/TPAMI.2005.164 for the FastMap analogy.

maxdf

Fitting parameters. See trajectories.

conv

Fitting parameters. See trajectories.

mccores

See trajectories.

verbose

Logical to turn on more output during fit iterations.

...

Additional parameters of optional plotting under verbose = 2. At this time, only xlim and ylim are allowed.

Value

A list returned by trajectories plus one more element ido, giving the original id numbers is invisibly returned. Invisible returns are useful for repeated runs that explore verbose clustra output.

Examples

set.seed(13)
data = gen_traj_data(n_id = c(50, 100), types = c(1, 2), 
                     intercepts = c(100, 80), m_obs = 20, 
                     s_range = c(-365, -14), e_range = c(0.5*365, 2*365))
cl = clustra(data, k = 2, maxdf = 20, conv = c(5, 0), verbose = TRUE)
tabulate(data$group)
tabulate(data$true_group)


[Package clustra version 0.2.1 Index]