dpseg {dpseg} | R Documentation |
dpseg : linear segmentation by dynamic programming
Description
dpseg
splits a curve (x,y data) into linear segments by a
straight forward dynamic programming recursion:
S_j =
max(S_{i-jumps} + score(i,j) -P)
where score is a measure of the
goodnes of the fit of a linear regression (equiv. to
lm(y~x)
) between data points i<j
. The default scoring
function is simply the negative variance of residuals of the linear
regression (see arguments type
and scoref
). P
is a break-point penality that implicitly regulates the number of
segments (higher P: longer segments), and jumps==1
allows
for disjoint segments. The arguments minl
and maxl
specify minimal (i\le j-minl
) and maximal (i\ge j-maxl
)
segment lengths, which allows to significantly decrease memory
usage when expected segment lengths are known.
Usage
dpseg(x, y, maxl, jumps = FALSE, P = 0, minl = 3, S0 = 1,
type = "var", scoref, verb = 1, move, store.values = TRUE,
store.matrix = FALSE, add.lm = FALSE, recursion, backtrace, ...)
Arguments
x |
x-values, not used if |
y |
y-values, or a pre-calculated scoring function matrix
|
maxl |
maximal segment length, |
jumps |
allow for jumps between segments, if |
P |
break-point penalty, increase to get longer segments with lower scores (eg. higher residual variance) |
minl |
minimal segment length, |
S0 |
initialization of |
type |
type of scoring function: available are "var" for
"variance of residuals", "cor" for Pearson correlation, or "r2"
for r-squared; see the package |
scoref |
alternative scoring function |
verb |
print progress messages |
move |
logical indicating whether move is required in
backtracing, required for the alternative recursion |
store.values |
store scoring values (linear regression results) |
store.matrix |
store the fitscore matrix |
add.lm |
add a linear fit using R base |
recursion |
internal recursion function to be used for
segmentation; used for debugging, benchmarking and development,
and required for putative novel scoring functions |
backtrace |
internal function to be used for back-tracing;
used for debugging, benchmarking and development, and may be
required to test novel scoring functions |
... |
further arguments to |
Details
See the vignette("dpseg")
for the theory and details on the
choice of scoring functions and selection of the penalty parameter
P.
Value
Returns a list object of class dpseg
(with
print.dpseg
plot.dpseg
and predict.dpseg
methods). The main result of the algorithm is a table
(data.frame
) of predicted segments in list object
segments
. The original data, run parameters and
(optionally) additional data calculated and used by the
algorithm are also returned.
- segments:
main result table: a
data.frame
that lists the start and end x-values of the segments, the start and end indices (i,j) in the data vectors, the linear regression coefficients and goodness-of-fit measures for the segments (intercept, slope, r-squared, variance of residuals). Ifdpseg
was called with a pre-calculated scoring matrix, the table only contains start and end indices i,j. If optionadd.lm=TRUE
or the result object was sent through functionaddLm
the table additionally contains results from R'slm
, indicated by an ".lm" suffix.- S:
results of the recursion, ie.
S_j
in above equation.- imax:
vector
j=1,\dots,n
, storing thei_{max}
that yieldedS_j
, ie., the sole input for the backtracing function.- values:
linear regression coefficients and measures for the segment ending at
j
and starting ati_{max}(j)
. Only present ifstore.valus=TRUE
.- SCR:
scoring function matrix
SCR_{i,j} = score(i,j)
where positions j are the columns and i the rows; a banded matrix with non-NA values betweeni\le j-minl
andi\ge j-maxl
. Note, that this matrix can be re-used in subsequent calls asdpseg(y=previous$SCR)
which runs much faster and allows to efficiently scan for alternative parameters. Only present ifstore.matrix=TRUE
.- fits:
result objects from
lm
. Only present ifadd.lm=TRUE
.- traceback:
result of the call to the backtracing function: ends of the segments.
- xy:
original x/y data (xy.coords).
- removed:
index of NA/Inf values that were removed before running the alorithm.
- parameters:
used parameters
P
,jumps
,maxl
andminl
.
Dependencies
The package strictly depends only on RcppEigen
.
All other dependencies are usually present in a
basic installation (stats
, graphics
, grDevices
).
Author(s)
Rainer Machne machne@hhu.de, Peter F. Stadler studla@bioinf.uni-leipzig.de
Examples
## calculate linear segments in semi-log bacterial growth data
## NOTE: library loads bacterial growth curve data as data.frame oddata
segs <- dpseg(x=oddata$Time, y=log(oddata$A3), minl=5, P=0.0001, verb=1)
## inspect resulting segments
print(segs)
## plot results (also see the movie method)
plot(segs, delog=TRUE, log="y")
## predict method
plot(predict(segs), type="l")