start_groups {clustra}R Documentation

Function to assign starting groups.

Description

Either a random assignment of k approximately equal size clusters or a FastMap-like algorithm that sequentially selects k distant ids from those that have more than the median number of observations. TPS fits to these ids are used as cluster centers for a starting group assignment. A user supplied starting assignment is also possible.

Usage

start_groups(k, data, starts, maxdf, conv, mccores = 1, verbose = FALSE)

Arguments

k

Number of clusters (groups).

data

Data.table with response measurements, one per observation. Column names are id, time, response, group. Note that ids are assumed sequential starting from 1. This affects expanding group numbers to ids.

starts

Type of start groups generated. See clustra.

maxdf

Fitting parameters. See trajectories.

conv

Fitting parameters. See trajectories.

mccores

See trajectories.

verbose

Turn on more output for debugging. Values 0, 1, 2, 3 add more output. 2 and 3 produce graphs during iterations - use carefully!

Value

An integer vector corresponding to unique ids, giving group number assignments.

For distant, each sequential selection takes an id that has the largest minimum distance from smooth TPS fits (<= 5 deg) of previous selections. The distance of an id to a single TPS is the median absolute error across the id time points. Distance of an id to a set of TPS is the minimum of the individual distances. We pick the id that has the maximum of such a minimum of medians.


[Package clustra version 0.2.1 Index]