getCritVal {clampSeg} | R Documentation |
Computes critical values for the functions jsmurf
, jules
, hilde
, stepDetection
and improveSmallScales
. getCritVal
is usually automatically called, but can be called explicitly, for instance outside of a for loop to save run time. Computation requires Monte-Carlo simulations. Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, the simulations are by default saved in the workspace and on the file system such that a second call that require the same Monte-Carlo simulation will be much faster. For more details, in particular to which arguments the Monte-Carlo simulations are specific, see Section Simulating, saving and loading of Monte-Carlo simulations below. Progress of a Monte-Carlo simulation can be reported by the argument messages
and the saving can be controlled by the argument option
.
getCritVal(n, filter, family = c("jules", "jsmurf", "jsmurfPS", "jsmurfLR",
"hjsmurf", "hjsmurfSPS", "hjsmurfLR", "LR", "2Param"),
alpha = 0.05, r = NULL, nq = n, options = NULL,
stat = NULL, messages = NULL, ...)
n |
a positive integer giving the number of observations |
filter |
an object of class |
family |
the parametric family for which critical values should be computed, select |
alpha |
a probability, i.e. a single numeric between 0 and 1, giving the significance level. Its choice is a trade-off between data fit and parsimony of the estimator. In other words, this argument balances the risks of missing conductance changes and detecting additional artefacts. For more details on this choice see the accompanying vignette or (Frick et al., 2014, Section 4) and (Pein et al., 2017, Section 3.4) |
r |
a positive integer giving the required number of Monte-Carlo simulations if they will be simulated or loaded from the workspace or the file system, a larger number improves accuracy but simulations last longer; by default |
nq |
a positive integer larger than or equal to |
options |
a |
stat |
an object of class |
messages |
a positive integer or |
... |
additional arguments of the parametric families |
For families "jules"
, "jsmurf"
, "jsmurfPS"
, "jsmurfLR"
a single numeric giving the critical value and for families "hjsmurf"
, "hjsmurfSPS"
, "hjsmurfLR"
, "LR"
and "2Param"
a numeric vector giving scale dependent critical values. Additionally, an attribute
n
with a single integer giving the number of data points for which the values are computed.
Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, this function offers multiple possibilities to save and load the simulations. By default, simulations are stored and loaded with suitable default values and no user choices are required. If desired, the simulation, saving and loading can be controlled by the argument option
. This argument has to be a list
or NULL
. For the list
the following named entries are allowed: "simulation"
, "save"
, "load"
, "envir"
and "dirs"
. All missing entries will be set to their default option.
Each Monte-Carlo simulation is specific to the parametric family
, their parameters in case of families "LR"
or "2Param"
, the number of observations and the used filter. Monte-Carlo simulations can also be performed for a (slightly) larger number of observations n_q
given in the argument nq
, which avoids extensive resimulations for only a little bit varying number of observations at price of a (slightly) smaller detection power. We recommend to not use a nq
more than two times larger than the number of observations n
.
Objects of the following types can be simulated, saved and loaded:
"vector"
: an object of class "MCSimulationMaximum"
for n
observations, i.e. a numeric vector of length r
"vectorIncreased"
: an object of class "MCSimulationMaximum"
for nq
observations, i.e. a numeric vector of length r
"matrix"
: an object of class "MCSimulationVector"
for n
observations, i.e. a matrix of dimensions as.integer(log2(n)) + 1L
and r
"matrixIncreased"
: an object of class "MCSimulationVector"
for nq
observations, i.e. a matrix of as.integer(log2(n)) + 1L
and r
Computation of scale depend critical values, i.e. calucatlions for the families "hjsmurf"
, "hjsmurfSPS"
, "hjsmurfLR"
, "LR"
and "2Param"
require an object of class "MCSimulationVector"
. Otherwise, objects of class "MCSimulationVector"
and objects of class "MCSimulationMaximum"
lead to the same result (if the number of observations is the same), but an object of class "MCSimulationVector"
requires much more storage space and has slightly larger saving and loading times. However, simulations of type "vectorIncreased"
, i.e. objects of class "MCSimulationMaximum"
with nq
observations, have to be resimulated if as.integer(log2(n1)) != as.integer(log2(n2))
when the saved simulation was computed with n == n1
and the simulation now is required for n == n2
and nq >= n1
and nq >= n2
. All in all, if all data sets in the analysis have the same number of observations simulations of type "vector"
for families "jules"
, "jsmurf"
, "jsmurfPS"
, "jsmurfLR"
and "matrix"
for families "hjsmurf"
, "hjsmurfSPS"
, "hjsmurfLR"
, "LR"
and "2Param"
are recommended. If they have a slightly different number of observations it is recommend to set nq
to the largest number and to use simulations for an increased number of observations. For families "jules"
, "jsmurf"
, "jsmurfPS"
, "jsmurfLR"
one should also consider the following: If as.integer(log2(n))
is the same for all data sets type "vectorIncreased"
is recommend , if they differ type "matrixIncreased"
avoids a resimulation at the price of a larger object to be stored and loaded.
The simulations can either be saved in the workspace in the variable critValStepRTab
or persistently on the file system for which the package R.cache
is used. Loading from the workspace is faster, but either the user has to save the workspace manually or in a new session simulations have to be performed again. Moreover, storing in and loading from variables and RDS files is supported.
For loading from / saving in the workspace the variable critValStepRTab
in the environment
options$envir
will be looked for and if missing in case of saving also created there. Moreover, the variable(s) specified in options$save$variable
(explained in the Subsection Saving: options$save) will be assigned to this environment
. By default the global environment .GlobalEnv
is used, i.e. options$envir == .GlobalEnv
.
For loading from / saving on the file system loadCache(key = keyList, dirs = options$dirs)
and saveCache(stat, key = attr(stat, "keyList"), dirs = options$dirs)
are called, respectively. In other words, options$dirs
has to be a character
vector
constituting the path to the cache subdirectory relative to the cache root directory as returned by getCacheRootPath
(). If options$dirs == ""
, the path will be the cache root path. By default the subdirectory "stepR"
is used, i.e. options$dirs == "stepR"
. Missing directories will be created.
Whenever Monte-Carlo simulations have to be performed, i.e. when stat == NULL
and the required Monte-Carlo simulation could not be loaded, the type specified in options$simulation
will be simulated by monteCarloSimulation
. In other words, options$simulation
must be a single string of the following: "vector"
, "vectorIncreased"
, "matrix"
or "matrixIncreased"
. By default (options$simulation == NULL
), an object of class "MCSimulationVector"
for nq
observations will be simulated, i.e. options$simulation
== "matrixIncreased"
. For this choice please recall the explanations regarding computation time and flexibility at the beginning of this section.
Loading of the simulations can be controlled by the entry options$load
which itself has to be a list
with possible entries: "RDSfile"
, "workspace"
, "package"
and "fileSystem"
. Missing entries disable the loading from this option.
Whenever a Monte-Carlo simulation is required, i.e. when the variable q
is not given, it will be searched for at the following places in the given order until found:
in the variable stat
,
in options$load$RDSfile
as an RDS file, i.e. the simulation will be loaded by
readRDS(options$load$RDSfile).
In other words, options$load$RDSfile
has to be a connection
or the name of the file where the R object is read from,
in the workspace or on the file system in the following order: "vector"
, "matrix"
, "vectorIncreased"
and finally of "matrixIncreased"
. For each option it will first be looked in the workspace and then on the file system. All searches can be disabled by not specifying the corresponding string in options$load$workspace
and options$load$fileSystem
. In other words, options$load$workspace
and options$load$fileSystem
have to be vectors of strings containing none, some or all of "vector"
, "matrix"
, "vectorIncreased"
and "matrixIncreased"
,
if all other options fail a Monte-Carlo simulation will be performed.
By default (if options$load
is missing / NULL
) no RDS file is specified and all other options are enabled, i.e.
options$load <- list(workspace = c("vector", "vectorIncreased", "matrix", "matrixIncreased"), fileSystem = c("vector", "vectorIncreased", "matrix", "matrixIncreased"), RDSfile = NULL).
Saving of the simulations can be controlled by the entry options$save
which itself has to be a list
with possible entries: "workspace"
, "fileSystem"
, "RDSfile"
and "variable"
. Missing entries disable the saving in this option.
All available simulations, no matter whether they are given by stat
, loaded, simulated or in case of "vector"
and "vectorIncreased"
computed from "matrix"
and "matrixIncreased"
, respectively, will be saved in all options for which the corresponding type is specified. Here we say a simulation is of type "vectorIncreased"
or "matrixIncreased"
if the simulation is not performed for n
observations. More specifically, a simulation will be saved:
in the workspace or on the file system if the corresponding string is contained in options$save$workspace
and options$save$fileSystem
, respectively. In other words, options$save$workspace
and options$save$fileSystem
have to be vectors of strings containing none, some or all of "vector"
, "matrix"
, "vectorIncreased"
and "matrixIncreased"
,
in a variable named by options$save$variable
in the environment
options$envir
. Hence, options$save$variable
has to be a vector of one or two containing variable names (character vectors). If options$save$variable
is of length two a simulation of type "vector"
or "vectorIncreased"
(only one can occur at one function call) will be saved in options$save$variable[1]
and "matrix"
or "matrixIncreased"
(only one can occur at one function call) will be saved in options$save$variable[2]
. If options$save$variable
is of length one both will be saved in options$save$variable
which means if both occur at the same call only "vector"
or "vectorIncreased"
will be saved. Each saving can be disabled by not specifying options$save$variable
or by passing ""
to the corresponding entry of options$save$variable
.
By default (if options$save
is missing) "vector"
and "vectorIncreased"
will be saved in the workspace and "matrixIncreased"
on the file system, i.e.
options$save <- list(workspace = c("vector", "vectorIncreased"), fileSystem = c("matrix", "matrixIncreased"), RDSfile = NULL, variable = NULL).
Simulations can be removed from the workspace by removing the variable critValStepRTab
, i.e. by calling remove(critValStepRTab, envir = envir)
, with envir
the used environment, and from the file system by deleting the corresponding subfolder, i.e. by calling
unlink(file.path(R.cache::getCacheRootPath(), dirs), recursive = TRUE),
with dirs
the corresponding subdirectory.
Pein, F., Bartsch, A., Steinem, C., Munk, A. (2020) Heterogeneous Idealization of Ion Channel Recordings - Open Channel Noise. arXiv:2008.02658.
Pein, F., Tecuapetla-Gómez, I., Schütte, O., Steinem, C., Munk, A. (2018) Fully-automatic multiresolution idealization for filtered ion channel recordings: flickering event detection. IEEE Transactions on NanoBioscience 17(3), 300–320.
Hotz, T., Schütte, O., Sieling, H., Polupanow, T., Diederichsen, U., Steinem, C., and Munk, A. (2013) Idealizing ion channel recordings by a jump segmentation multiresolution filter. IEEE Transactions on NanoBioscience 12(4), 376–386.
Frick, K., Munk, A., Sieling, H. (2014) Multiscale change-point inference. With discussion and rejoinder by the authors. Journal of the Royal Statistical Society, Series B 76(3), 495–580.
Pein, F., Sieling, H., Munk, A. (2017) Heterogeneous change point inference. Journal of the Royal Statistical Society, Series B, 79(4), 1207–1227.
jsmurf
, jules
, hilde
, lowpassFilter
, stepDetection
, improveSmallScales
# the for the recording of the gramA data set used filter
filter <- lowpassFilter(type = "bessel", param = list(pole = 4L, cutoff = 1e3 / 1e4),
sr = 1e4)
# critical value for jules or stepDetection
# this call requires a Monte-Carlo simulation at the first time
# and therefore might last a few minutes,
# progress of the Monte-Carlo simulation is reported
q <- getCritVal(length(gramA), filter = filter, messages = 100)
# this second call should be much faster
# as the previous Monte-Carlo simulation will be loaded
getCritVal(length(gramA), filter = filter)
# critical value for jsmurf,
# Monte-Carlo simulations are specific to the parametric family,
# hence a new Monte-Carlo simulation is required
getCritVal(length(gramA), family = "jsmurfPS", filter = filter, messages = 100)
# scale dependent critical value for jsmurf allowing for heterogeneous noise,
# return value is a vector
getCritVal(length(gramA), family = "hjsmurf", filter = filter, messages = 100)
# scale dependent critical value for "LR" as used by improveSmallScales and hilde,
# return value is a vector
getCritVal(length(gramA), family = "LR", filter = filter, messages = 100)
# families "LR" and "2Param" allows to specify additional parameters in ...
# Monte-Carlo simulations are also specific to those values
getCritVal(length(gramA), family = "LR", filter = filter, messages = 100,
localValue = mean, thresholdLongSegment = 15L)
# much larger significance level alpha for a larger detection power,
# but also with the risk of detecting additional artefacts
getCritVal(length(gramA), filter = filter, alpha = 0.9)
# medium significance level alpha for a tradeoff between detection power
# and the risk to detect additional artefacts
getCritVal(length(gramA), filter = filter, alpha = 0.5)
# critical values depend on the number of observations and on the filter
# also a new Monte-Carlo simulation is required
getCritVal(100, filter = filter, messages = 500)
otherFilter <- lowpassFilter(type = "bessel",
param = list(pole = 6L, cutoff = 0.2),
sr = 1e4)
getCritVal(100, filter = otherFilter, messages = 500)
# simulation for a larger number of oberservations can be used (nq = 100)
# does not require a new simulation as the simulation from above will be used
# (if the previous call was executed first)
getCritVal(90, filter = filter, nq = 100)
# simulation of type "vectorIncreased" for n1 observations can only be reused
# for n2 observations if as.integer(log2(n1)) == as.integer(log2(n2))
# no simulation is required, since a simulation of type "matrixIncreased"
# will be loaded from the fileSystem
# this call also saved a simulation of type "vectorIncreased" in the workspace
getCritVal(30, filter = filter, nq = 100)
# here a new simulation is required
# (if no appropriate simulation is saved from a call outside of this file)
getCritVal(10, filter = filter, nq = 100, messages = 500,
options = list(load = list(workspace = c("vector", "vectorIncreased"))))
# the above calls saved and (attempted to) load Monte-Carlo simulations
# in the following call the simulations will neither be saved nor loaded
# to save some time the number of iterations is reduced to r = 1e3
# hence the critical value is computed with less precision
# In general, r = 1e3 is enough for a first impression
# for a detailed analysis r = 1e4 is suggested
getCritVal(100, filter = filter, messages = 100, r = 1e3,
options = list(load = list(), save = list()))
# simulations will only be saved in and loaded from the workspace,
# but not on the file system
getCritVal(100, filter = filter, messages = 100, r = 1e3,
options = list(load = list(workspace = c("vector", "vectorIncreased")),
save = list(workspace = c("vector", "vectorIncreased"))))
# explicit Monte-Carlo simulations, not recommended
stat <- stepR::monteCarloSimulation(n = 100, , family = "mDependentPS",
filter = filter, output = "maximum",
r = 1e3, messages = 100)
getCritVal(100, filter = filter, stat = stat)