SKFCPD-package {SKFCPD}R Documentation

Dynamic Linear Model for Online Changepoint Detection

Description

The 'SKFCPD' package provides estimation of changepoint locations using the Dynamic Linear Model (DLM) within the Bayesian Online Changepoint Detection (BOCPD) framework. The efficient computation is achieved through implementation of the Sequential Kalman filter. The range parameter and noise-to-signal ratio are estimated from training samples via a Gaussian process model. This package is capable of handling multidimensional data with temporal correlations and random missing patterns.

Details

The DESCRIPTION file:

Package: SKFCPD
Type: Package
Title: Fast Online Changepoint Detection for Temporally Correlated Data
Version: 0.2.4
Date: 2024-02-15
Authors@R: c(person(given="Hanmo",family="Li",role=c("aut", "cre"), email="hanmo@pstat.ucsb.edu"), person(given="Yuedong",family="Wang", role=c("aut"), email="yuedong@pstat.ucsb.edu"), person(given="Mengyang",family="Gu", role=c("aut"), email="mengyang@pstat.ucsb.edu"))
Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>
Author: Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]
Description: Sequential Kalman filter for scalable online changepoint detection by temporally correlated data. It enables fast single and multiple change points with missing values. See the reference: Hanmo Li, Yuedong Wang, Mengyang Gu (2023), <arXiv:2310.18611>.
License: GPL (>= 3)
Depends: R (>= 3.5.0), methods (>= 4.2.2), rlang (>= 1.0.6), ggplot2 (>= 3.4.0), ggpubr (>= 0.5.0), reshape2 (>= 1.4.4), FastGaSP (>= 0.5.2)
Imports: Rcpp (>= 1.0.9)
LinkingTo: Rcpp, RcppEigen
NeedsCompilation: yes
Encoding: UTF-8
Packaged: 2024-02-15 11:15:56 UTC; lihan
Archs: x64

Index of help topics:

Estimate_GP_params      Estimate parameters from fast computation of
                        GaSP model
SKFCPD                  Getting the results of the SKFCPD model
SKFCPD-class            Class '"SKFCPD"'
SKFCPD-package          Dynamic Linear Model for Online Changepoint
                        Detection
plot_SKFCPD             Plot for SKFCPD model

Implements a fast online changepoint detection algorithm using dynamic linear model based on Sequential Kalman filter. It's for temporally correlated data and accepts multi-dimensional datasets with missing values.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Li, Hanmo, Yuedong Wang, and Mengyang Gu. Sequential Kalman filter for fast online changepoint detection in longitudinal health records. arXiv preprint arXiv:2310.18611 (2023).

Fearnhead, Paul, and Zhen Liu. On-line inference for multiple changepoint problems. Journal of the Royal Statistical Society Series B: Statistical Methodology 69, no. 4 (2007): 589-605.

Adams, Ryan Prescott, and David JC MacKay. Bayesian online changepoint detection. arXiv preprint arXiv:0710.3742 (2007).

Hartikainen, Jouni, and Simo Sarkka. Kalman filtering and smoothing solutions to temporal Gaussian process regression models. In 2010 IEEE international workshop on machine learning for signal processing, pp. 379-384. IEEE, 2010.

Gu, Mengyang, and Yanxun Xu. Fast nonseparable Gaussian stochastic process with application to methylation level interpolation. Journal of Computational and Graphical Statistics 29, no. 2 (2020): 250-260.

Gu, Mengyang, and Weining Shen. Generalized probabilistic principal component analysis of correlated data. The Journal of Machine Learning Research 21, no. 1 (2020): 428-468.

Gu, Mengyang, Xiaojing Wang, and James O. Berger. Robust Gaussian stochastic process emulation. The Annals of Statistics 46, no. 6A (2018): 3038-3066.

See Also

SKFCPD

Examples

  library(SKFCPD)
  
  #------------------------------------------------------------------------------
  # Example: fast online changepoint detection with DEPENDENT data.
  # 
  # Data generation: Data follows a multidimensional Gaussian process with Matern 2.5 kernel.
  #------------------------------------------------------------------------------
  # Data Generation
  set.seed(1)
  
  n_obs = 150
  n_dim = 2
  seg_len = c(70, 30, 20,30)
  mean_each_seg = c(0,1,-1,0)
  
  x_mat=matrix(1:n_obs)
  y_mat=matrix(NA, nrow=n_obs, ncol=n_dim)
  
  gamma = rep(5, n_dim) # range parameter of the covariance matrix
  
  # compute the matern 2.5 kernel
  construct_cor_matrix = function(input, gamma){
    n = length(input)
    R0=abs(outer(input,(input),'-'))
    matrix_one = matrix(1, n, n)
    const = sqrt(5) * R0 / gamma
    Sigma = (matrix_one + const + const^2/3) * (exp(-const))
    return(Sigma)
  }
  
  for(j in 1:n_dim){
    y_each_dim = c()
    for(i in 1:length(seg_len)){
      nobs_per_seg = seg_len[i]
      Sigma = construct_cor_matrix(1:nobs_per_seg, gamma[j])
      L=t(chol(Sigma))
      theta=rep(mean_each_seg[i],nobs_per_seg)+L%*%rnorm(nobs_per_seg)
      y_each_dim = c(y_each_dim, theta+0.1*rnorm(nobs_per_seg))
    }
    y_mat[,j] = y_each_dim
  }
  
  ## Detect changepoints by SKFCPD
  Online_CPD_1 = SKFCPD(design = x_mat,
                        response = y_mat,
                        train_prop = 1/3)
  
  ## visulize the results
  plot_SKFCPD(Online_CPD_1)

[Package SKFCPD version 0.2.4 Index]