pass {anomaly}R Documentation

Detection of multivariate anomalous segments using PASS.

Description

Implements the PASS (Proportion Adaptive Segment Selection) procedure of Jeng et al. (2012). PASS uses a higher criticism statistic to pool the information about the presence or absence of a collective anomaly across the components. It uses Circular Binary Segmentation to detect multiple collective anomalies.

Usage

pass(
  x,
  alpha = 2,
  lambda = NULL,
  max_seg_len = 10,
  min_seg_len = 1,
  transform = robustscale
)

Arguments

x

An n x p real matrix representing n observations of p variates.

alpha

A positive integer > 0. This value is used to stabilise the higher criticism based test statistic used by PASS leading to a better finite sample familywise error rate. Anomalies affecting fewer than alpha components will however in all likelihood escape detection.

lambda

A positive real value setting the threshold value for the familywise Type 1 error. The default value is (1.1 {\rm log}(n \times max\_seg\_len) +2 {\rm log}({\rm log}(p))) / √{{\rm log}({\rm log}(p))}.

max_seg_len

A positive integer (max_seg_len > 0) corresponding to the maximum segment length. This parameter corresponds to Lmax in Jeng et al. (2012). The default value is 10.

min_seg_len

A positive integer (max_seg_len >= min_seg_len > 0) corresponding to the minimum segment length. This parameter corresponds to Lmin in Jeng et al. (2012). The default value is 1.

transform

A function used to transform the data prior to analysis. The default value is to scale the data using the median and the median absolute deviation.

Value

An instance of an S4 object of type .pass.class containing the data X, procedure parameter values, and the results.

References

Jeng XJ, Cai TT, Li H (2012). “Simultaneous discovery of rare and common segment variants.” Biometrika, 100(1), 157–172. ISSN 0006-3444, doi: 10.1093/biomet/ass059, https://academic.oup.com/biomet/article/100/1/157/193108.

Examples

library(anomaly)
# generate some multivariate data
set.seed(0)
sim.data<-simulate(n=500,p=100,mu=2,locations=c(100,200,300),
                   duration=6,proportions=c(0.04,0.06,0.08))
res<-pass(sim.data)
summary(res)
plot(res,variate_names=TRUE)


[Package anomaly version 4.0.1 Index]