capa {anomaly}R Documentation

A technique for detecting anomalous segments and points based on CAPA.

Description

A technique for detecting anomalous segments and points based on CAPA (Collective And Point Anomalies) by Fisch et al. (2022). This is a generic method that can be used for both univariate and multivariate data. The specific method that is used for the analysis is deduced by capa from the dimensions of the data. The inputted data is either a vector (in the case of a univariate time-series) or a array with p columns (if the the time-series is p-dimensional). The CAPA procedure assumes that each component of the time-series is standardised so that the non-anomalous segments of each component have mean 0 and variance 1. This may require pre-processing/standardising. For example, using the median of each component as a robust estimate of its mean, and the mad (median absolute deviation from the median) estimator to get a robust estimate of the variance.

Usage

capa(
  x,
  beta,
  beta_tilde,
  type = "meanvar",
  min_seg_len = 10,
  max_seg_len = Inf,
  max_lag = 0
)

Arguments

x

A numeric matrix with n rows and p columns containing the data which is to be inspected. The time series data classes ts, xts, and zoo are also supported.

beta

A numeric vector of length p giving the marginal penalties. If beta is missing and p == 1 then beta = 3log(n) when the type is "mean" or "robustmean", and beta = 4log(n) otherwise. If beta is missing and p > 1, type ="meanvar" or type = "mean" and max_lag > 0 then it defaults to the penalty regime 2' described in Fisch, Eckley and Fearnhead (2022). If beta is missing and p > 1, type = "mean"/"meanvar" and max_lag = 0 it defaults to the pointwise minimum of the penalty regimes 1, 2, and 3 in Fisch, Eckley and Fearnhead (2022).

beta_tilde

A numeric constant indicating the penalty for adding an additional point anomaly. If beta_tilda is missing it defaults to 3log(np), where n and p are the data dimensions.

type

A string indicating which type of deviations from the baseline are considered. Can be "meanvar" for collective anomalies characterised by joint changes in mean and variance (the default), "mean" for collective anomalies characterised by changes in mean only, or "robustmean" (only allowed when p = 1) for collective anomalies characterised by changes in mean only which can be polluted by outliers.

min_seg_len

An integer indicating the minimum length of epidemic changes. It must be at least 2 and defaults to 10.

max_seg_len

An integer indicating the maximum length of epidemic changes. It must be at least min_seg_len and defaults to Inf.

max_lag

A non-negative integer indicating the maximum start or end lag. Only useful for multivariate data. Default value is 0.

Value

An instance of an S4 class of type capa.class.

References

Fisch ATM, Eckley IA, Fearnhead P (2022). “A linear time method for the detection of collective and point anomalies.” Statistical Analysis and Data Mining: The ASA Data Science Journal, 15(4), 494-508. doi:10.1002/sam.11586.

Examples

library(anomaly)
# generate some multivariate data
data(simulated)
res<-capa(sim.data,type="mean",min_seg_len=2,max_lag=5)
collective_anomalies(res)
plot(res)


[Package anomaly version 4.3.2 Index]