edecob {edecob} | R Documentation |
Event DEtection using COnfidence Bounds
Description
Calculate a smoother of longitudinal data of the same measure and bootstrap the errors of the autoregressive model fitted on the smoother to form simultaneous confidence bounds of a certain level (mathematical details below). Define an event if the simultaneous confidence bound is within a chosen interval for a predefined amount of time. When data from multiple sources is provided, the calculation will be done separately for each source.
Usage
edecob(
data,
smoother = "mov_med",
resample_method = "all",
min_change_dur = 84,
conf_band_lvl = 0.95,
bt_tot_rep = 100,
time_unit = "day",
detect = "below",
detect_factor = 1,
bline_period = 14,
...
)
Arguments
data |
A data frame in long format containing the data for which events is to be detected. This means that each measurement corresponds to a row and the columns are (in order): source (the device or person from which the data was collected), point in time, and measurement value. If custom detection bounds are chosen, the folloing two columns must be added: lower detection bound, and upper detection bound. The source is expected to be a string; the time point are integers; measurements, and detection bounds are expected to be numerical. The detection bounds are in absolute value in the same unit as the values and each is expected to be identical for the same source. In case detection is wanted for a one sided change (e.g. give an event if the confidence bounds drop below a threshold) then the upper or lower detection bound can be chosen to be Inf or -Inf respectively. |
smoother |
A string specifying which smoother is to be used. Use |
resample_method |
A string that determines how to resample the errors of the
autoregression for the bootstrap. Default is |
min_change_dur |
The minimal number of time units that the confidence bounds need to stay inside the detection bounds in order for an event to be detected. Defaults to 84, i.e. 12 weeks. |
conf_band_lvl |
The confidence level for the simultaneous confidence
bands. Defaults to 0.95. When detection of events using only the smoother
is desired, |
bt_tot_rep |
The number of iterations for the bootstrap computation. Because of run time, it is recommended to keep this number below 500. Defaults to 100. |
time_unit |
A string containing the unit of time used, in singular form. Defaults to day. |
detect |
A string specifying how the detection bounds are to be chosen.
|
detect_factor |
A number specifying the factor by which the median of
the fist |
bline_period |
A number specifying the number of time units from which data should be taken to calculate the median to obtain the detection bounds. |
... |
Additional parameters to be given to the function. Possible
parameters for the model are The parameter If the parameter When the moving
median is used as the smoother, When resampling from window, one can choose the window size for the
resampling window with |
Details
For the moving median, the med_win is the total size of the window, meaning
that for the value corresponding to day x, the data points from day
x + med_win[1]
to x + med_win[2]
will be used for the calculation
of the median.
If there is no data for two times med_win[2]-med_win[1]
consecutive time units, there
will be time points at which no confidence bound can be calculated. In this
case, it will be assumed that the confidence bound is outside of the
detection interval when detecting sustained change.
In case there are multiple instances where the algorithm would detect a
sustained change (i.e. if after the first sustained change the confidence
bounds leave the detection interval and then return into it
for longer than min_change_dur
time units) then only the first
sustained change would be detected.
Please note that the event onset could be on a date where there are no actual
measurements. This can happen when there is a gap in the data. In this case, the
confidence bounds will extend into the gap.
If the confidence bounds in this period are outside the detection interval and
remain outside for the next min_change_duration
time units,
the event onset will be in this gap.
The censoring date is based on the last date where the confidence bounds can be calculated. We do not extend the confidence bounds to the last data point so that the confidence bounds don't change in case we obtain new measurements with time points later than the latest time point at which we have a measurement.
The edecob
function runs the functions mov_med
, smoother_resid
,
bt_smoother
, conf_band
, and detect_event
in this order
for all subjects given. If desired, the functions can also manually be
applied for the data to obtain e.g. the confidence bands. Note that in order
to run one of these functions, the output of the previous functions are needed.
Value
The output data
is a list containing as many elements as
the number of sources in data
plus one. Every element in this list
will again be a list named after the corresponding sources. Each of
these lists contains the following elements:
event
gives a list with four values:
event_detected
,event_onset
,event_duration
, andevent_stop
.event_detected
gives whether an event was detected
event_onset
gives the first time point at which the upper or lower bound of the confidence band is inside the detection bounds, and after which it stays inside the detection bounds for at least
min_change_dur
consecutive time unitsevent_duration
gives the number of time units the upper or lower bound of the confidence band stays inside the detection bounds after
event_onset
event_stop
gives whether the confidence bounds stay inside the detection bounds until the last time point at which we can calculate the confidence bound or not.
conf_band
gives a data frame containing the confidence bands. The columns are source, time point, lower bound, and upper bound of the confidence band.
smoother_pts
gives a data frame containing the smoother. The columns are source, time point, and the smoother
data
gives the data but with four additional columns:
event_detected
,event_onset
,event_duration
, andevent_stop
. They contain the same values as inevent
.detec_lower
gives the lower detection bound.
detec_upper
gives the upper detection bound.
smoother
gives the smoother used.
resample_method
gives the resampling method used for the bootstrap.
min_change_dur
gives the smallest consecutive number of time units the confidence bounds must stay within the detection bounds in order for an event to be detected.
conf_band_lvl
gives the level of the simultaneous confidence band.
bt_tot_rep
gives the total amount of bootstrap repetitions performed.
call
gives the function call.
col_names
gives the original column names of the data.
time_unit
gives the unit of time used.
The last element in the output data
is called event_info
and
is a data frame containing the information from event
from each
patient. event_info
will thus have the following columns:
source
, event_detected
, event_onset
,
event_duration
, and event_stop
.
Mathematical background
The mathematical background will be explained in the following sections.
Moving Median
Consider a sample X_1,\dots, X_n
of size n
and the
reordering X_{(1)},\dots, X_{(n)}
such
that X_{(1)} \le X_{(2)} \le \dots \le X_{(n)}
, commonly
called the order statistic. Then for n
even the median usually
defined as
median(X_1,\dots, X_n) = X_{(k)}, \mathrm{where} \; k = n/2.
In the
case where n
is odd the median is
defined as
median(X_1,\dots, X_n) = 1/2(X_{(k)} + X_{(k+1)}), \mathrm{where} \; k = n/2.
Let the
study days at which the measurements X_1, \dots, X_n
were taken
be t_1, \dots, t_n
.
Let T
a fixed positive amount of time. Then the moving median at time
point t
with window size T
is defined as
S(t) = median({X_j | t - T/2 \le t_j \le t + T/2}).
The Model
An autoregressive (AR) model is used to model the residuals of the smoother \eta
:
Y(t) = S(t) + \eta(t)
\eta(t) = \sum^p_{j = 1} \phi_j \eta(t - j) + \epsilon
where
variable t
is the study day, Y(t)
the data point at study day t
,
S(t)
a smoother, \eta(t)
the difference between the smoother
and the measurement at study day t
, p
the order of the AR model, \phi_j
the coefficients of the AR model, and
\epsilon
the error of the AR model. The order is calculated using the
Akaike information criterion (AIC) if it was not given in the function call.
Bootstrap
In the following, the star * denotes a bootstrapped value. The bootstrap procedure is as follows:
Compute the smoother
S(t)
.Compute the residuals
\eta(t_i) = Y(t_i) - S(t_i)
.Fit an AR(p) model to
\eta(t_i)
to obtain the coefficients\phi_1,\dots, \phi_p
and\epsilon(t_i) = \eta(t_i) - \sum^p_{j = 1} \phi_j \eta(t_i - t_{i-j})
the error of the AR model.Randomly choose a
\epsilon(t_i)^*
with replacement from\epsilon(t_{p+1}),\dots, \epsilon(t_n)
to obtainY(t_i)^* = S(t_i) + \eta(t_i)^*,
where
\eta(t_i)^* = \sum^p_{j = 1} \phi_j \eta(t_{i-j})^*+ \epsilon(t_{i-j})^*
the bootstrapped residuals of the smoother.
Compute
S(.)^* = g(Y(t_1),\dots, Y(t_n))
whereg
is the function with which the smoother is calculated.Repeat steps 4 and 5
bt_tot_rep
times to obtainS(t_i)^*_b
for\beta = 1,\dots,
bt_tot_rep
.
Calculation of the Confidence Bounds
The confidence bounds are calculated as follows:
We compute the quantiles
q_x(t_i), q_{1-x}(t_i) i = 1,\dots, N
where
q_x(t_i) = inf\left\{u; P^*[S(t_i)^*_b - S(t_i) \le u] \ge x\right\}
is a pointwise bootstrap quantile,
S(t_i)^*_b
the bootstrapped smoother, andN
the number of measurements or rows indata
, in our case the number of rows.We vary the pointwise error
x
untilP^*[q_x(t_i) \le S(t_i)^*_b - S(t_i) \le q_{1-x}(t_i) \forall i = 1,\dots, N] \approx 1-\alpha.
In other words, until the ratio of bootstrap curves that have all their points within
[q_x(t_i), q_{1-x}(t_i)]
is approximately1-\alpha
.We define
I_n(t_i) = [S(t_i) + q_x(t_i), S(t_i) + q_{1-x}(t_i)] \forall i = 1,\dots, N
the confidence bounds. Then
{I_n(t_i); i = 1,\dots, N}
is a consistent simultaneous confidence band of level1-\alpha
.
References
Bühlmann, P. (1998). Sieve Bootstrap for Smoothing in Nonstationary Time Series. The Annals of Statistics, 26(1), 48-83.
Hogg, R., McKean, J. and Craig, A. (2014). Introduction to mathematical statistics. Harlow: Pearson Education.
See Also
Examples
library(edecob)
# Let us examine the example_data dataset
head(example_data, 3)
#> subject study_day jump_height
#> 1 Subject 1 1 58.13024
#> 2 Subject 1 5 59.48988
#> 3 Subject 1 9 54.14774
# We apply the main fuction of the package onto our example_data
example_event <- edecob(example_data, med_win = c(-21,21), bt_tot_rep = 10,
min_change_dur = 70)
names(example_event)
#> [1] "Subject 1" "Subject 2" "event_info"
# example_event contains the event data for each source
plot(example_event$`Subject 1`)
plot(example_event$`Subject 2`)
# example_event also contains a data frame containing the event information for all patients
example_event$event_info
#> event_detected event_onset event_duration event_stop
#> Subject 1 TRUE 119 134 TRUE
#> Subject 2 FALSE 306 60 FALSE
# Using this data frame, we can draw a survival plot
library("survival")
plot(survfit(Surv(time = event_onset, event = event_detected) ~ 1,
data = example_event$event_info),
conf.int = FALSE, xlim = c(0,350), ylim = c(0,1), mark.time = TRUE,
xlab = "Time point", ylab = "Survival probability", main = "Survival plot")