critVal {stepR} | R Documentation |
Critical values
Description
Computes the vector of critical values or the global quantile. This function offers two ways of computation, either at significance level alpha
from a Monte-Carlo simulation, see also section 3.2 in the vignette for more details, or from the global quantile / critical values given in the argument q
. For more details on these two options see Section Computation of critical values / global quantile.
Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, this package saves them by default in the workspace and on the file system such that a second call requiring the same Monte-Carlo simulation will be much faster. For more details, in particular to which arguments the Monte-Carlo simulations are specific, see Section Storing of Monte-Carlo simulations below. Progress of a Monte-Carlo simulation can be reported by the argument messages
in ...
, explained in monteCarloSimulation
, and the saving can be controlled by the argument option
.
Usage
critVal(n, q = NULL, alpha = NULL, nq = 2L^(as.integer(log2(n) + 1e-12) + 1L) - 1L,
family = NULL, intervalSystem = NULL, lengths = NULL, penalty = NULL,
weights = NULL, stat = NULL, r = 1e4, output = c("vector", "value"),
options = NULL, ...)
Arguments
n |
a positive integer giving the number of observations |
q |
either |
alpha |
a probability, i.e. a single numeric between 0 and 1, giving the significance level. Its choice is a trade-off between data fit and parsimony of the estimator. In other words, this argument balances the risks of missing change-points and detecting additional artefacts. For more details on this choice see (Frick et al., 2014, section 4) and (Pein et al., 2017, section 3.4). Either |
nq |
a positive integer larger than or equal to |
family |
a string specifying the assumed parametric family, for more details see parametricFamily, currently |
intervalSystem |
a string giving the used interval system, either |
lengths |
an integer vector giving the set of lengths, i.e. only intervals of these lengths will be considered. Note that not all lengths are possible for all interval systems and for all parametric families, see intervalSystem and parametricFamily, respectively, to see which ones are allowed. By default ( |
penalty |
a string specifying how different scales will be balanced, either |
weights |
a numeric vector of length weights == rep(1 / length(lengths), length(lengths)) |
stat |
an object of class |
r |
a positive integer giving the required number of Monte-Carlo simulations if they will be simulated or loaded from the workspace or the file system |
output |
a string specifying the return value, if |
options |
a |
... |
there are two groups of further arguments:
|
Value
If output == "vector"
a numeric vector giving the vector of critical values, i.e. a vector of length length(lengths)
, giving for each length the corresponding critical value. If output == "value"
a single numeric giving the global quantile. In both cases, additionally, an attribute
"n"
gives the number of observations for which the Monte-Carlo simulation was performed.
Computation of critical values / global quantile
This function offers two ways to compute the resulting value:
If
q == NULL
it will be computed at significance levelalpha
from a Monte-Carlo simulation. For penalties"sqrt"
,"log"
and"none"
the global quantile will be the (1-alpha
)-quantile of the penalised multiscale statistic, see section 3.2.1 in the vignette. And if required the vector of critical values will be derived from it. For penalty"weights"
the vector of critical values will be calculated accordingly to the givenweights
. The Monte-Carlo simulation can either be given instat
or will be attempted to load or will be simulated. How Monte-Carlo simulations are simulated, saved and loaded can be controlled by the argumentoption
, for more details see the Section Simulating, saving and loading of Monte-Carlo simulations.If
q
is given it will be derived from it. For the argumentq
either a single finite numeric giving the global quantile or a vector of finite numerics giving the vector of critical values (not allowed foroutput == "value"
) is possible:A single numeric giving the global quantile. If
output == "vector"
the vector of critical values will be computed from it for the givenlengths
andpenalty
(penalty"weights"
is not allowed). Note that the global quantile is specific to the argumentsfamily
,intervalSystem
,lengths
andpenalty
.A vector of length
length(lengths)
, giving for each length the corresponding critical value. This vector is identical to the vector of critical values.A vector of length
n
giving for each length1:n
the corresponding critical value.A vector of length equal to the number of all possible lengths for the given interval system and the given parametric family giving for each possible length the corresponding critical value.
Additionally, an
attribute
"n"
giving the number of observations for whichq
was computed is allowed. Thisattribute
must be a single integer and equal to or larger than the argumentn
which means thatq
must have been computed for at leastn
observations. This allows additionally:A vector of length
attr(q, "n")
giving for each length1:attr(q, "n")
the corresponding critical value.A vector of length equal to the number of all possible lengths for the given interval system and the given parametric family if the number of observations is
attr(q, "n")
giving for each possible length the corresponding critical value.
The
attribute
"n"
will be kept or set ton
if missing.
Simulating, saving and loading of Monte-Carlo simulations
Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, this function offers multiple possibilities for saving and loading the simulations. The simulation, saving and loading can be controlled by the argument option
. This argument has to be a list
or NULL
and the following named entries are allowed: "simulation"
, "save"
, "load"
, "envir"
and "dirs"
. All missing entries will be set to their default option.
Objects of class "MCSimulationVector"
, containing simulations of the multiscale vector of statistics, and objects of class "MCSimulationMaximum"
, containing simulations of the penalised multiscale statistic (for penalties "sqrt"
, "log"
and "none"
), can be simulated, saved and loaded. Each Monte-Carlo simulation is specific to the number of observations, the parametric family and the interval system, for "MCSimulationMaximum"
additionally to the set of lengths and the used penalty. Both types will lead to the same result, however, an object of class "MCSimulationVector"
is more flexible, since critical values for all penalties and all set of lengths can be derived from it, but requires much more storage space and has slightly larger saving and loading times. Note that Monte-Carlo simulations can only be saved and loaded if they are generated with the default function for generating random observations, i.e. when rand.gen
(in ...
) is NULL
. For a given simulation this is signalled by the attribute
"save"
which is TRUE
if a simulation can be saved and FALSE
otherwise.
Monte-Carlo simulations can also be performed for a (slightly) larger number of observations n_q
given in the argument nq
, which avoids extensive resimulations for only a little bit varying number of observations. The overestimation control is still satisfied but the detection power is (slightly) smaller. But note that the default lengths
might change when the number of observations is increased and, hence, for type "vectorIncreased"
still a different simulation might be required.
We refer to the different types as follow:
-
"vector"
: an object of class"MCSimulationMaximum"
, i.e. simulations of the penalized multiscale statistic, forn
observations -
"vectorIncreased"
: an object of class"MCSimulationMaximum"
, i.e. simulations of the penalized multiscale statistic, fornq
observations -
"matrix"
: an object of class"MCSimulationVector"
, i.e. simulations of the multiscale vector of statistics, forn
observations -
"matrixIncreased"
: an object of class"MCSimulationVector"
, i.e. simulations of the multiscale vector of statistics, fornq
observations
The simulations can either be saved in the workspace in the variable critValStepRTab
or persistently on the file system for which the package R.cache
is used. Loading from the workspace is faster, but either the user has to store the workspace manually or in a new session simulations have to be performed again. Moreover, storing in and loading from variables and RDS files is supported. Finally, a pre-computed collection of simulations of type "matrixIncreased"
for parametric families "gauss"
and "hsmuce"
can be accessed by installing the package stepRdata
available from http://www.stochastik.math.uni-goettingen.de/stepRdata_1.0-0.tar.gz.
options$envir and options$dirs
For loading from / saving in the workspace the variable critValStepRTab
in the environment
options$envir
will be looked for and if missing in case of saving also created there. Moreover, the variable(s) specified in options$save$variable
(explained in the Subsection Saving: options$save) will be assigned to this environment
. options$envir
will be passed to the arguments pos
and where
in the functions assign
, get
, and exists
, respectively. By default, a local enviroment in the package is used.
For loading from / saving on the file system loadCache(key = keyList, dirs = options$dirs)
and saveCache(stat, key = attr(stat, "keyList"), dirs = options$dirs)
are called, respectively. In other words, options$dirs
has to be a character
vector
constituting the path to the cache subdirectory relative to the cache root directory as returned by getCacheRootPath
(). If options$dirs == ""
the path will be the cache root path. By default the subdirectory "stepR"
is used, i.e. options$dirs == "stepR"
. Missing directories will be created.
Simulation: options$simulation
Whenever Monte-Carlo simulations have to be performed, i.e. when stat == NULL
and the required Monte-Carlo simulation could not be loaded, the type specified in options$simulation
will be simulated by monteCarloSimulation
. In other words, options$simulation
must be a single string of the following: "vector"
, "vectorIncreased"
, "matrix"
or "matrixIncreased"
. By default (options$simulation == NULL
), an object of class "MCSimulationVector"
for nq
observations will be simulated, i.e. options$simulation
== "matrixIncreased"
. For this choice please recall the explanations regarding computation time and flexibility at the beginning of this section.
Loading: options$load
Loading of the simulations can be controlled by the entry options$load
which itself has to be a list
with possible entries: "RDSfile"
, "workspace"
, "package"
and "fileSystem"
. Missing entries disable the loading from this option.
Whenever a Monte-Carlo simulation is required, i.e. when the variable q
is not given, it will be searched for at the following places in the given order until found:
in the variable
stat
,in
options$load$RDSfile
as an RDS file, i.e. the simulation will be loaded byreadRDS(options$load$RDSfile).
In other words,
options$load$RDSfile
has to be aconnection
or the name of the file where the R object is read from,in the workspace or on the file system in the following order:
"vector"
,"matrix"
,"vectorIncreased"
and finally of"matrixIncreased"
. Forpenalty == "weights"
it will only be looked for"matrix"
and"matrixIncreased"
. For each options it will first be looked in the workspace and then on the file system. All searches can be disabled by not specifying the corresponding string inoptions$load$workspace
andoptions$load$fileSystem
. In other words,options$load$workspace
andoptions$load$fileSystem
have to be vectors of strings containing none, some or all of"vector"
,"matrix"
,"vectorIncreased"
and"matrixIncreased"
,in the package
stepRdata
(if installed) and ifoptions$load$package == TRUE
. In other words,options$load$package
must be a single logical orNULL
,if all other options fail a Monte-Carlo simulation will be performed.
By default (if options$load
is missing / NULL
) no RDS file is specified and all other options are enabled, i.e.
options$load <- list(workspace = c("vector", "vectorIncreased", "matrix", "matrixIncreased"), fileSystem = c("vector", "vectorIncreased", "matrix", "matrixIncreased"), package = TRUE, RDSfile = NULL).
Saving: options$save
Saving of the simulations can be controlled by the entry options$save
which itself has to be a list
with possible entries: "workspace"
, "fileSystem"
, "RDSfile"
and "variable"
. Missing entries disable the saving in this option.
All available simulations, no matter whether they are given by stat
, loaded, simulated or in case of "vector"
and "vectorIncreased"
computed from "matrix"
and "matrixIncreased"
, respectively, will be saved in all options for which the corresponding type is specified. Here we say a simulation is of type "vectorIncreased"
or "matrixIncreased"
if the simulation is not performed for n
observations. More specifically, a simulation will be saved:
in the workspace or on the file system if the corresponding string is contained in
options$save$workspace
andoptions$save$fileSystem
, respectively. In other words,options$save$workspace
andoptions$save$fileSystem
have to be vectors of strings containing none, some or all of"vector"
,"matrix"
,"vectorIncreased"
and"matrixIncreased"
,in an RDS file specified by
options$save$RDSfile
which has to be a vector of one or twoconnections
or names of files where the R object is saved to. Ifoptions$save$RDSfile
is of length two a simulation of type"vector"
or"vectorIncreased"
(only one can occur at one function call) will be saved inoptions$save$RDSfile[1]
bysaveRDS(stat, file = options$save$RDSfile[1])
and
"matrix"
or"matrixIncreased"
(only one can occur at one function call) will be saved inoptions$save$RDSfile[2]
. Ifoptions$save$RDSfile
is of length one both will be saved inoptions$save$RDSfile
which means if both occur at the same call only"vector"
or"vectorIncreased"
will be saved. Each saving can be disabled by not specifyingoptions$save$RDSfile
or by passing an empty string to the corresponding entry ofoptions$save$RDSfile
.in a variable named by
options$save$variable
in theenvironment
options$envir
. Hence,options$save$variable
has to be a vector of one or two containing variable names (character vectors). Ifoptions$save$variable
is of length two a simulation of type"vector"
or"vectorIncreased"
(only one can occur at one function call) will be saved inoptions$save$variable[1]
and"matrix"
or"matrixIncreased"
(only one can occur at one function call) will be saved inoptions$save$variable[2]
. Ifoptions$save$variable
is of length one both will be saved inoptions$save$variable
which means if both occur at the same call only"vector"
or"vectorIncreased"
will be saved. Each saving can be disabled by not specifyingoptions$save$variable
or by passing""
to the corresponding entry ofoptions$save$variable
.
By default (if options$save
is missing) "vector"
and "vectorIncreased"
will be saved in the workspace and "matrix"
and "matrixIncreased"
on the file system, i.e.
options$save <- list(workspace = c("vector", "vectorIncreased"), fileSystem = c("matrix", "matrixIncreased"), RDSfile = NULL, variable = NULL).
Simulations can be removed from the workspace by removing the variable critValStepRTab
, i.e. by calling remove(critValStepRTab, envir = envir)
, with envir
the used environment, and from the file system by deleting the corresponding subfolder, i.e. by calling
unlink(file.path(R.cache::getCacheRootPath(), dirs), recursive = TRUE),
with dirs
the corresponding subdirectory.
References
Frick, K., Munk, A., Sieling, H. (2014) Multiscale change-point inference. With discussion and rejoinder by the authors. Journal of the Royal Statistical Society, Series B 76(3), 495–580.
Pein, F., Sieling, H., Munk, A. (2017) Heterogeneous change point inference. Journal of the Royal Statistical Society, Series B, 79(4), 1207–1227.
See Also
monteCarloSimulation
, penalty
, parametricFamily
, intervalSystem
, stepFit
, computeBounds
Examples
# vector of critical values
qVector <- critVal(100L, alpha = 0.5)
# global quantile
qValue <- critVal(100L, alpha = 0.5, output = "value")
# vector can be computed from the global quantile
identical(critVal(100L, q = qValue), qVector)
# for a conservative significance level, stronger confidence statements
critVal(100L, alpha = 0.05)
critVal(100L, alpha = 0.05, output = "value")
# higher significance level for larger detection power, but less confidence
critVal(100L, alpha = 0.99)
critVal(100L, alpha = 0.99, output = "value")
# different parametric family, different intervalSystem, a subset of lengths,
# different penalty and given weights
q <- critVal(100L, alpha = 0.05, family = "hsmuce", intervalSystem = "dyaLen",
lengths = c(2L, 4L, 16L, 32L), penalty = "weights",
weights = c(0.4, 0.3, 0.2, 0.1))
# vector of critical values can be given by a vector of length n
vec <- 1:100
vec[c(2L, 4L, 16L, 32L)] <- q
attr(vec, "n") <- 128L
identical(critVal(100L, q = vec, family = "hsmuce", intervalSystem = "dyaLen",
lengths = c(2L, 4L, 16L, 32L)), q)
# with a given monte-Carlo simulation for nq = 128 observations
stat <- monteCarloSimulation(128)
critVal(n = 100L, alpha = 0.05, stat = stat)
# the above calls saved and (attempted to) load Monte-Carlo simulations and
# simulated them for nq = 128 observations
# in the following call no saving, no loading and simulation for n = 100
# observations is required, progress of the simulation will be reported
critVal(n = 100L, alpha = 0.05, messages = 1000L,
options = list(simulation = "vector", load = list(), save = list()))
# only type "vector" will be saved and loaded in the workspace
critVal(n = 100L, alpha = 0.05, messages = 1000L,
options = list(simulation = "vector", load = list(workspace = "vector"),
save = list(workspace = "vector")))
# simulation of type "matrix" will be saved in a RDS file
# saving of type "vector" is disabled by passing "",
# different seed is set and number of simulations is reduced to r = 1e3
# to allow faster computation at the price of a less precise result
file <- tempfile(pattern = "file", tmpdir = tempdir(), fileext = ".RDS")
critVal(n = 100L, alpha = 0.05, seed = 1, r = 1e3,
options = list(simulation = "matrix", load = list(),
save = list(RDSfile = c("", file))))
identical(readRDS(file), monteCarloSimulation(100L, seed = 1, r = 1e3))