R: Cluster a processed time-series with k-means.

clusterTimeseries {segmenTier}

R Documentation

Cluster a processed time-series with k-means.

Description

Performs kmeans clustering of a time-series object tset provided by processTimeseries, and calculates cluster-cluster and cluster-position similarity matrices as required for segmentClusters.

Usage

clusterTimeseries(tset, K = 16, iter.max = 1e+05, nstart = 100,
  nui.thresh = -Inf, verb = 1)

Arguments

`tset`	a "timeseries" object returned by `processTimeseries`
`K`	the number of clusters to be calculated, ie. the argument `centers` of `kmeans`, but here multiple clusterings can be calculated, ie. `K` can be an integer vector. Note that a smaller cluster number is automatically chosen, if the data doesn't have more then K different values.
`iter.max`	the maximum number of iterations allowed in `kmeans`
`nstart`	number of randomized initializations of `kmeans`: "how many random sets should be chosen?"
`nui.thresh`	threshold correlation of a data point to a cluster center; if below the data point will be added to nuisance cluster 0
`verb`	level of verbosity, 0: no output, 1: progress messages

Details

This function performs one or more time-series clustering(s) using kmeans, and the output of processTimeseries as input. It further calculates cluster centers, cluster-cluster and cluster-position similarity matrices (Pearson correlation) that will be used by the main function of this package, segmentClusters, to split the cluster association sequence into segments, and assigns each segment to the "winning" input cluster.

The argument K is an integer vector that sets the requested cluster numbers (argument centers in kmeans). However, to avoid errors in batch use, a smaller K is chosen, if the data contains less then K distinct values.

Nuisance Cluster: values that were removed during time-series processing, such as rows that only contain 0 or NA values, will be assigned to the "nuisance cluster" with cluster label "0". Additionally, a minimal correlation to any cluster center can be specified, argument nui.thresh, and positions without any correlation higher then this, will also be assigned to the "nuisance" cluster. Resulting "nuisance segments" will not be shown in the results.

Cluster Sorting and Coloring: additionally the cluster labels in the result object will be sorted by cluster-cluster similarity (see sortClusters) and cluster colors assigned (see colorClusters) for convenient data inspection with the plot methods available for each data processing step (see examples).

Note that the function, in conjunction with processTimeseries, can also be used as a stand-alone tool for time-series clusterings, specifically implementing the strategy of clustering the Discrete Fourier Transform of periodic time-series developed by Machne & Murray (2012) <doi:10.1371/journal.pone.0037906>, and further analyzed in Lehmann et al. (2013) <doi:10.1186/1471-2105-14-133>, such as transcriptome data from circadian or yeast respiratory oscillation systems.

Value

Returns a list of class "clustering" comprising of a matrix of clusterings, lists of cluster centers, cluster-cluster and cluster-position similarity matrices (Pearson correlation) used by segmentClusters, and additional information such as a cluster sorting by similarity and cluster colors that allow to track clusters in plots. A plot method exists that allows to plot clusters aligned to "timeseries" and "segment" plots.

References

Machne & Murray (2012) <doi:10.1371/journal.pone.0037906>, and Lehmann et al. (2013) <doi:10.1186/1471-2105-14-133>

Examples

data(primseg436)
## Discrete Fourier Transform of the time-series, 
## see ?processTimeseries for details
tset <- processTimeseries(ts=tsd, na2zero=TRUE, use.fft=TRUE,
                          dft.range=1:7,  dc.trafo="ash", use.snr=TRUE)
## ... and cluster the transformed time-series
cset <- clusterTimeseries(tset)
## plot methods for both returned objects allow aligned plots
par(mfcol=c(3,1))
plot(tset)
plot(cset)

[Package segmenTier version 0.1.2 Index]