agDimFunction {PCDimension} | R Documentation |
Divide Steps into "Long" and "Short" to Compute Auer-Gervini Dimension
Description
Auer and Gervini developed a Bayesian graphical method to determine
the number d
of significant principal components; a brief
overview is included in the help for the AuerGervini
class. The output of their method is a step function that displays
the maximum a posteriori (MAP) choice of d
as a step function of
a one-parameter family of prior distributions, and they recommend
choosing the highest "long" step. The functions described here help
automate the process of dividing the step lengths into "long" and
"short" classes.
Usage
agDimTwiceMean(stepLength)
agDimKmeans(stepLength)
agDimKmeans3(stepLength)
agDimSpectral(stepLength)
agDimTtest(stepLength, extra=0)
agDimTtest2(stepLength)
agDimCPT(stepLength)
makeAgCpmFun(method)
Arguments
stepLength |
A numeric vector |
method |
A character string describing a method supported by the
|
extra |
Just ignore this. Don't use it. It's a hack to avoid having to maintain two different versions of the same code. |
Details
The agDimTwiceMean
function implements a simple and naive rule:
a step is considered long if it as least twice the mean length.
The agDimKmeans
uses the kmeans
algorithm with
k=2
to divide the step lengths into two classes. Starting
centers for the groups are taken to be the minimum and maximum
values.
The agDimKmeans3
function uses kmeans
with k=3
,
using the median as the third center. Only one of the three groups is
considered "short".
The agDimSpectral
applies spectral clustering (as implemented
by the specc
function from the kernlab
package)
to divide the steps lengths into two groups.
The agDimTtest
and agDimTtest2
functions implement two
variants of a novel algorithm specialized for this particular task.
The idea is to start by sorting the step lengths so that
L_1 \le
L_2 \le \dots \le L_n.
Then, for each i \in 3,\dots, N-1
, we
compute the mean and standard deviation of the first i
step
lengths. Finally, one computes the likelhood that L_{i+1}
comes
from the normal distribution defined by the first i
lengths. If
the probability that L_{i+1}
is larger is less than 0.01
,
then it is chosen as the "smallest long step".
The novel method just described can also be viewed as a way to detect
a particular kind of change point. So, we also provide the
agDimCPT
function that uses the changepoint detection
algorithm implement by the cpt.mean
function in the
changepoint
package. More generally, the makeAgCpmFun
allows you to use any of the changepoint models implemented as part
of the detectChangePointBatch
function in the cpm
package.
Value
Each of the functions agDimTwiceMean
, agDimKmeans
,
agDimKmeans3
, agDimSpectral
, agDimTtest
,
agDimTtest2
, and agDimCPT
returns a logical vector whose
length is equal to the input stepLength
. TRUE
values
identify "long" steps and FALSE
values identify "short"
steps.
The makeAgCpmFun
returns a function that takes one argument (a
numeric stepLength
vector) and returns a logical vector of the
same length.
Note: Our simulations suggest that "TwiceMean" and "CPM" give the best results.
Author(s)
Kevin R. Coombes <krc@silicovore.com>, Min Wang <wang.1807@osu.edu>.
References
P Auer, D Gervini. Choosing principal components: a new graphical method based on Bayesian model selection. Communications in Statistics-Simulation and Computation 37 (5), 962-977
See Also
The functions described here implerment different algorithms that can
be used by the agDimension
function to automatically
compute the number of significant principal components based on the
AuerGervini
approach. Several of these functions are
wrappers around functions defined in other packages, including
specc
in the kernlab
package,
cpt.mean
in the changepoint
package, and
detectChangePointBatch
in the cpm
package.
Examples
# simulate variances
lambda <- rev(sort(diff(sort(c(0, 1, runif(9))))))
# apply the Auer-Gervini method
ag <- AuerGervini(lambda, dd=c(3,10))
# Review the results
summary(ag)
agDimension(ag)
agDimension(ag, agDimKmeans)
agDimension(ag, agDimSpectral)
f <- makeAgCpmFun("Exponential")
agDimension(ag, f)