R: Divide Steps into "Long" and "Short" to Compute Auer-Gervini...

agDimFunction {PCDimension}

R Documentation

Divide Steps into "Long" and "Short" to Compute Auer-Gervini Dimension

Description

Auer and Gervini developed a Bayesian graphical method to determine the number d of significant principal components; a brief overview is included in the help for the AuerGervini class. The output of their method is a step function that displays the maximum a posteriori (MAP) choice of d as a step function of a one-parameter family of prior distributions, and they recommend choosing the highest "long" step. The functions described here help automate the process of dividing the step lengths into "long" and "short" classes.

Usage

  agDimTwiceMean(stepLength)
  agDimKmeans(stepLength)
  agDimKmeans3(stepLength)
  agDimSpectral(stepLength)
  agDimTtest(stepLength, extra=0)
  agDimTtest2(stepLength)
  agDimCPT(stepLength)
  makeAgCpmFun(method)

Arguments

`stepLength`	A numeric vector
`method`	A character string describing a method supported by the `detectChangePointBatch` function in the `cpm` package.
`extra`	Just ignore this. Don't use it. It's a hack to avoid having to maintain two different versions of the same code.

Details

The agDimTwiceMean function implements a simple and naive rule: a step is considered long if it as least twice the mean length.

The agDimKmeans uses the kmeans algorithm with k=2 to divide the step lengths into two classes. Starting centers for the groups are taken to be the minimum and maximum values.

The agDimKmeans3 function uses kmeans with k=3, using the median as the third center. Only one of the three groups is considered "short".

The agDimSpectral applies spectral clustering (as implemented by the specc function from the kernlab package) to divide the steps lengths into two groups.

The agDimTtest and agDimTtest2 functions implement two variants of a novel algorithm specialized for this particular task. The idea is to start by sorting the step lengths so that

L_1 \le L_2 \le \dots \le L_n.

Then, for each i \in 3,\dots, N-1, we compute the mean and standard deviation of the first i step lengths. Finally, one computes the likelhood that L_{i+1} comes from the normal distribution defined by the first i lengths. If the probability that L_{i+1} is larger is less than 0.01, then it is chosen as the "smallest long step".

The novel method just described can also be viewed as a way to detect a particular kind of change point. So, we also provide the agDimCPT function that uses the changepoint detection algorithm implement by the cpt.mean function in the changepoint package. More generally, the makeAgCpmFun allows you to use any of the changepoint models implemented as part of the detectChangePointBatch function in the cpm package.

Value

Each of the functions agDimTwiceMean, agDimKmeans, agDimKmeans3, agDimSpectral, agDimTtest, agDimTtest2, and agDimCPT returns a logical vector whose length is equal to the input stepLength. TRUE values identify "long" steps and FALSE values identify "short" steps.

The makeAgCpmFun returns a function that takes one argument (a numeric stepLength vector) and returns a logical vector of the same length.

Note: Our simulations suggest that "TwiceMean" and "CPM" give the best results.

Author(s)

Kevin R. Coombes <krc@silicovore.com>, Min Wang <wang.1807@osu.edu>.

References

P Auer, D Gervini. Choosing principal components: a new graphical method based on Bayesian model selection. Communications in Statistics-Simulation and Computation 37 (5), 962-977

Examples

# simulate variances
lambda <- rev(sort(diff(sort(c(0, 1, runif(9))))))
# apply the Auer-Gervini method
ag <- AuerGervini(lambda, dd=c(3,10))
# Review the results
summary(ag)
agDimension(ag)
agDimension(ag, agDimKmeans)
agDimension(ag, agDimSpectral)
f <- makeAgCpmFun("Exponential")
agDimension(ag, f)

[Package PCDimension version 1.1.13 Index]