segmentClusters {segmenTier} | R Documentation |
Run the segmenTier
algorithm.
Description
segmenTier's main wrapper interface, calculates segments from a
clustering sequence. This will run the segmentation algorithm once
for the indicated parameters. The function
segmentCluster.batch
allows for multiple runs over
different parameters or input-clusterings.
Usage
segmentClusters(seq, k = 1, csim, E = 1, S = "ccor", M = 175,
Mn = 20, a = -2, nui = 1, nextmax = TRUE, multi = "max",
multib = "max", rm.nui = TRUE, save.matrix = FALSE, verb = 1)
Arguments
seq |
Either an integer vector of cluster labels, or a
structure of class 'clustering' as returned by
|
k |
if argument |
csim |
The cluster-cluster or position-cluster similarity
matrix for scoring functions "ccor" and "icor" (option
|
E |
exponent to scale similarity matrices |
S |
the scoring function to be used: "ccor", "icor" or "ccls" |
M |
segment length penalty. Note, that this is not a strict cut-off but defined as a penalty that must be "overcome" by good score. |
Mn |
segment length penalty for nuisance cluster. Mn<M will allow shorter distances between "real" segments; only used in scoring functions "ccor" and "icor" |
a |
a cluster "dissimilarity" only used for pure cluster-based scoring w/o cluster similarity measures in scoring function "ccls". |
nui |
the similarity score to be used for nuisance clusters in the cluster similarity matrices |
nextmax |
go backwards while score is increasing before opening a new segment, default is TRUE |
multi |
handling of multiple k with max. score in forward phase, either "min" (default) or "max" |
multib |
handling of multiple k with max. score in back-trace phase, either "min" (default), "max" or "skip" |
rm.nui |
remove nuisance cluster segments from final results |
save.matrix |
store the total score matrix |
verb |
level of verbosity, 0: no output, 1: progress messages |
Details
This is the main R wrapper function for the ‘segmenTier’
segmentation algorithm. It takes an ordered sequence of cluster
labels and returns segments of consistent clusterings, where
cluster-cluster or cluster-position similarities are
maximal. Its main input (argument seq
) is either a
"clustering" object returned by clusterTimeseries
(scenario I), or an integer vector of cluster labels (scenario
II) or. The function then runs the dynamic programming algorithm
(calculateScore
) for a selected scoring function
and an according cluster similarity matrix, followed by the
back-tracing step (backtrace
) to find segment
borders.
The main result, list item "segments" of the returned
object, is a 3-column matrix, where column 1 is the cluster
assignment and columns 2 and 3 are start and end indices of the
segments. For the batch function segmentCluster.batch
,
the "segments" item is a data.frame
contain additional information, see ?segmentCluster.batch.
As shown in the publication, the parameters M
,
E
and nui
have the strongest impact on resulting
segment borders. Other parameters can be fine-tuned but had
little impact on our test data set.
In the default and tested scenario I, when the input is an object
of class "clustering" produced by clusterTimeseries
,
the cluster-cluster and cluster-position similarity matrices are
already provided by this object.
In the second scenario II for custom use, argument seq
can
be a simple clustering vector, where a nuisance cluster must be
indicated by cluster label "0" (zero). The cluster-cluster or
cluster-position similarities MUST be provided (argument
csim
) for scoring functions "ccor" and "icor",
respectively. For the simplest scoring function "ccls", a uniform
cluster similarity matrix is constructed from arguments a
and nui
, with cluster self-similarities of 1,
"dissimilarities" between different clusters using argument
a<0
, and nuisance cluster self-similarity of -a
.
The function returns a list (class "segments") comprising of the
main result (list item "segments"), and "warnings" from the dynamic
programming and backtracing phases, the used similarity matrix
csim
, extended by the nuisance cluster; and optionally (see
option save.matrix
) the scoring vectors S1(i,c)
, the
total score matrix S(i,c)
and the backtracing matrix
K(i,c)
for analysis of algorithm performance for novel data
sets. Additional convenience data is reported, such as cluster
colors and sortings if argument seq
was of class
'clustering'. These allow for convenient inspection of all data
processing steps with the plot methods. A plot method exists that
allows to plot segments aligned to "timeseries" and "clustering"
plots.
Value
Returns a list (class "segments") containing the main result (list item "segments"), and additional information (see ‘Details’). A plot method exists that allows to plot clusters aligned to time-series and segmentation plots.
References
Machne, Murray & Stadler (2017) <doi:10.1038/s41598-017-12401-8>
Examples
# load example data, an RNA-seq time-series data from a short genomic region
# of budding yeast
data(primseg436)
# 1) Fourier-transform time series:
## NOTE: reducing official example data set to stay within
## CRAN example timing restrictions with segmentation below
tset <- processTimeseries(ts=tsd[2500:6500,], na2zero=TRUE, use.fft=TRUE,
dft.range=1:7, dc.trafo="ash", use.snr=TRUE)
# 2) cluster time-series into K=12 clusters:
cset <- clusterTimeseries(tset, K=12)
# 3) ... segment it; this takes a few seconds:
segments <- segmentClusters(seq=cset, M=100, E=2, nui=3, S="icor")
# 4) inspect results:
print(segments)
plotSegmentation(tset, cset, segments, cex=.5, lwd=3)
# 5) and get segment border table for further processing:
sgtable <- segments$segments