onlineCPD {ocp} | R Documentation |
Bayesian Online Changepoint Detection
Description
The main algorithm called "Bayesian Online Changepoint Detection". Input is data in form of a matrix and, optionally an existing ocp object to build on. Output is the list of changepoints and other values calculated during running the model.
Usage
onlineCPD(datapts, oCPD = NULL, missPts = "none",
hazard_func = function(x, lambda) { const_hazard(x, lambda = 100)
}, probModel = list("g"), init_params = list(list(m = 0, k = 0.01, a
= 0.01, b = 1e-04)), multivariate = FALSE, cpthreshold = 0.5,
truncRlim = .Machine$double.xmin, minRlength = 1,
maxRlength = 10^4, minsep = 1, maxsep = 10^4, timing = FALSE,
getR = FALSE, optionalOutputs = FALSE, printupdates = FALSE)
Arguments
datapts |
the input data in form of a matrix, where the rows correspond to each data point, and the columns correspond to each dimension. |
oCPD |
ocp object computed in a previous run of an algorithm. it can be built upon with the input data points, as long as the settings for both are the same. |
missPts |
This setting indicates how to deal with missing points (e.g. NA). The options are: "mean", "prev", "none", and a numeric value. If the data is multivariate. The numeric replacement value could either be a single value which would apply to all dimensions, or a vector of the same length as the number of dimensions of the data. |
hazard_func |
This setting allows choosing a hazard function, and also setting the constants within that function. For example, the default hazard function is: function(x, lambda)const_hazard(x, lambda=100) and the lambda can be set as appropriate. |
probModel |
This parameter is a function to be used to calculate the predictive probabilities and update the parameters of the model. The default setting uses a gaussian underlying distribution: "gaussian" |
init_params |
The parameters used to initialize the probability model. The default settings correspond to the input default gaussian model. |
multivariate |
This setting indicates if the incoming data is multivariate or univariate. |
cpthreshold |
Probability threshold for the method of extracting a list of all changepoints that have a run length probability higher than a specified value. The default is set to 0.5. |
truncRlim |
The probability threshold to begin truncating the R vector. The R vector is a vector of run-length probabilities. To prevent truncation, set this to 0. The defaults setting is 10^(-4) as suggested by the paper. |
minRlength |
The minimum size the run length probabilities vector must be before beginning to check for the truncation threshold. |
maxRlength |
The maximum size the R vector is allowed to be, before enforcing truncation to happen. |
minsep |
This setting constrains the possible changepoint locations considered in determining the optimal set of changepoints. It prevents considered changepoints that are closer together than the value of minsep. The default is 3. |
maxsep |
This setting constrains the possible changepoint locations considered in determining the optimal set of changepoints. It prevents considered changepoints that are closer farther apart than the value of maxsep. The default is 100. |
timing |
To print out times during the algorithm running, to track its progress, set this setting to true. |
getR |
To output the full R matrix, set this setting to TRUE. Outputting this matrix causes a major slow down in efficiency. |
optionalOutputs |
Output additional values calculated during running the algorithm, including a matrix containing all the input data, the predictive probability vectors at each step of the algorithm, and the vector of means at each step of the algorithm. |
printupdates |
This setting prints out updates on the progress of the algorithm if set to TRUE. |
Value
An ocp object containing the main output: a list of changepoints from each time point, and many additional outputs: the number of time points, the initial settings of the algorithm, the current model parameters, the means from each time point, the most recently processed point, the most recently calculated vector of run length probabilities, and a vector of probabilities of changepoints at each time point.
Examples
simdatapts<- c(rnorm(n = 50), rnorm(n=50, 100))
ocpd1<- onlineCPD(simdatapts)
ocpd1$changepoint_lists # view the changepoint lists