improveSmallScales {clampSeg} R Documentation

## Improves small scales

### Description

Implements the second and third step of HILDE (Pein et al., 2020). It refines an initial fit, e.g. obtained by jsmurf on small temporal scales by testing for events and local deconvolution. Refinment can be done assuming homogeneous noise, but also allow heterogeneous noise. Please see the argument method and the examples for how to access the function correctly depending on whether homogeneous noise is assumed or heterogeneous noise is allowed. Further details about how to decide whether the noise is homogeneous or heterogeneous and whether events are short are given in the accompanying vignette.
If q == NULL a Monte-Carlo simulation is required for computing critical values. Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, the package saves them by default in the workspace and on the file system such that a second call requiring the same Monte-Carlo simulation will be much faster. For more details, in particular to which arguments the Monte-Carlo simulations are specific, see Section Storing of Monte-Carlo simulations below. Progress of a Monte-Carlo simulation can be reported by the argument messages and the saving can be controlled by the argument option, both can be specified in ... and are explained in getCritVal.

### Usage

improveSmallScales(fit, data, filter, method = c("2Param", "LR"),
lengths = NULL, q = NULL, alpha = 0.04, r = 1e3, startTime = 0,
thresholdLongSegment = if (method == "LR") 10L else 25L,
localValue = stats::median,
localVar = function(data) stepR::sdrobnorm(data,
lag = filter$len + 1L)^2, regularization = 1, gridSize = c(1, 1 / 10, 1 / 100) / filter$sr,
windowFactorRefinement = 1,
output = c("onlyIdealization", "everyGrid"), report = FALSE,
suppressWarningNoDeconvolution = FALSE,
localList = NULL, ...)


 fit an stepblock object or a list containing an entry fit with a stepblock object giving the initial fit (reconstruction), e.g. computed by jsmurf with locationCorrection == "none" data a numeric vector containing the recorded data points filter an object of class lowpassFilter giving the used analogue lowpass filter method the testing method for short events, "2Param" allows for heterogeneous noise, "LR" assumes homogeneous noise lengths a vector of integers giving the lengths on which tests will be performed to detect short events, should be chosen such that events on larger scales are already contained in fit q a numeric vector of the same length as lengths giving scale dependent critical values for the tests for short events, by default chosen automatically by getCritVal alpha a probability, i.e. a single numeric between 0 and 1, giving the significance level to compute the critical values (if q == NULL), see getCritVal. Its choice balances the risks of missing short events and detecting additional artefacts r an integer giving the number of Monte-Carlo simulations (if q == NULL), a larger value increases accuracy, but also the run time startTime a single numeric giving the time at which recording (sampling) of data started, sampling time points will be assumed to be startTime + seq(along = data) / filter$sr thresholdLongSegment a single integer giving the threshold determining how many observations are necessary to estimate parameters, conductance value and its variance, without deconvolution; has to be chosen such that localValue and localVar can be applied to a vector of this and larger length localValue a function for estimating the conductance levels on long segments, see details, will be called with localValue(data[i:j]) with i and j two integers in 1:length(data) and j - i >= thresholdLongSegment, has to return a single numeric localVar a function for estimating the variance on long segments, see details, will be called with localVar(data[i:j]) with i and j two integers in 1:length(data) and j - i >= thresholdLongSegment, has to return a single, positive numeric regularization a single positive numeric or a numeric vector with positive entries or a list of length length(gridSize), with each entry a single positive numeric or a numeric vector with positive entries, giving the regularisation added to the correlation matrix, see details. For a list the i-th entry will be used in the i-th refinement gridSize a numeric vector giving the size of the grids in the iterative grid search, see details windowFactorRefinement a single numeric or a numeric vector of length length(gridSize) - 1 giving factors for the refinement of the grid, see details. If a single numeric is given its value is used in all refinement steps output a string specifying the return type, see Value report a single logical, if TRUE the progress will be reported by messages suppressWarningNoDeconvolution a single logical, if FALSE a warning will be given if at least one segment exists for which no deconvolution can be performed, since two short segments follow each other immediately localList an object of class "localList", usually computed by createLocalList, doing such a pre-computation saves run time if improveSmallScales or hilde is called multiple times with the same arguments ... additional parameters to be passed to getCritVal, getCritVal will be called automatically (if q == NULL), the number of data points n = length(data) will be set, argument method will be assigned to family and alpha and filter will be passed. For these parameter no user interaction is required and possible, all other parameters of getCritVal can be passed additionally ### Details First of al, tests for additional short events are performed. Those tests take into account the lowpass filter explicitly. Afterwards all conductance levels (of the newly found and of the already existing event) and locations of the conductance changes are determined by local deconvolution. The local deconvolution consists of two parts. In the first part, all segments of the initial fit, potentially interrupted by newly detected events, will be divided into long and short ones. The first and lasts filter$len data points of each segment will be ignored and if the remaining data points data[i:j] are at least thresholdLongSegment, i.e. j - i + 1 >= thresholdLongSegment, the parameters, conductance level and variance, of this segment will be determined by localValue(data[i:j]) and localVar(data[i:j]).
The long segments allow in the second part to perform the deconvolution locally by maximizing the likelihood function by an iterative grid search. Three scenarios might occur: Two long segments can follow each other, in this case the change, but no level, has to be estimated by maximizing the likelihood function of only few observations in this single parameter. A single short segment can be in between of two long segments, in this case two changes and one level have to be estimated by maximizing the likelihood function of only few observations in these three parameters. Finally, two short segments can follow each other, in this case no deconvolution is performed and the initial parameters are returned for these segments together with entries in the "noDeconvolution" attribute. More precisely, let i:j be the short segments, then i:j will be added to the "noDeconvolution" attribute and for the idealisation (if output == "everyGrid" this applies for each entry) the entries value[i:j], leftEnd[i:(j + 1)] and rightEnd[(i - 1):j] are kept from the initial fit without refinement by deconvolution. If suppressWarningNoDeconvolution == FALSE, additionally, a warning will be given at first occurrence.
Maximisation of the likelihood is performed by minimizing (Pein et al., 2018, (9)), a term of the form x^T\Sigma x, where \Sigma is the regularised correlation matrix and x a numeric vector of the same dimension. More precisely, the (unregularised) correlations are filter$acf, to this the regularisation regularization is added. In detail, if regularization is a numeric, the regularised correlation is cor <- filter$acf
cor[seq(along = regularization)] <- cor[seq(along = regularization)] + regularization


and if regularization is a list the same, but regularization is in the i-th refinement replaced by regularization[[i]]. Then, \Sigma is a symmetric Toeplitz matrix with entries cor, i.e. a matrix with cor[1] on the main diagonal, cor[2] on the second diagonal, etc. and 0 for all entries outside of the first length(cor) diagonals.
The minimisations are performed by an iterative grid search: In a first step potential changes will be allowed to be at the grid / time points seq(cp - filter$len / filter$sr, cp, gridSize[1]), with cp the considered change of the initial fit. For each grid point in case of a single change and for each combination of grid points in case of two changes the term in (9) is computed and the change(s) for which the minimum is attained is / are chosen. Afterwards, refinements are done with the grids

seq(cp - windowFactorRefinement[j - 1] * gridSize[j - 1],
cp + windowFactorRefinement[j - 1] * gridSize[j - 1],
gridSize[j]),


with cp the change of the iteration before, as long as entries in gridSize are given.

### Value

The idealisation (fit, regression) obtained by testing for short events and local deconvolution. If output == "onlyIdealization" an object of class stepblock containing the final idealisation obtained by local deconvolution. If output == "everyGrid" a list of length length(gridSize) containing the idealisation after each refining step. Additionally, in both cases, an attribute "noDeconvolution", an integer vector, gives the segments for which no deconvolution could be performed, since two short segments followed each other, see details. Moreover, the (computed) q is returned as an attribute.

### Storing of Monte-Carlo simulations

If q == NULL a Monte-Carlo simulation is required to compute the critical values. Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, multiple possibilities for saving and loading the simulations are offered. Progress of a simulation can be reported by the argument messages which can be specified in ... and is explained in the documentation of getCritVal. Each Monte-Carlo simulation is specific to test method, the number of observations and the used filter. Monte-Carlo simulations are also specific to the arguments thresholdLongSegment, localValue and localVar. Currently, storing a Monte-Carlo simulation is only possible for their default values. Note, that also Monte-Carlo simulations for a (slightly) larger number of observations n_q, given in the argument nq in ... and explained in the documentation of getCritVal, can be used, which avoids extensive resimulations for only a little bit varying number of observations, but results in a (small) loss of power. However, simulations of type "matrixIncreased", i.e. objects of class "MCSimulationVector" with nq observations, have to be resimulated if as.integer(log2(n1)) != as.integer(log2(n2)) when the saved simulation was computed with n == n1 and the simulation now is required for n == n2 and nq >= n1 and nq >= n2. Simulations can either be saved in the workspace in the variable critValStepRTab or persistently on the file system for which the package R.cache is used. Moreover, storing in and loading from variables and RDS files is supported. The simulation, saving and loading can be controlled by the argument option which can be specified in ... and is explained in the documentation of getCritVal. By default simulations will be saved in the workspace and on the file system. For more details and for how simulation can be removed see Section Simulating, saving and loading of Monte-Carlo simulations in getCritVal.

### References

Pein, F., Bartsch, A., Steinem, C., Munk, A. (2020) Heterogeneous Idealization of Ion Channel Recordings - Open Channel Noise. arXiv:2008.02658.

Pein, F., Tecuapetla-Gómez, I., Schütte, O., Steinem, C., Munk, A. (2018) Fully-automatic multiresolution idealization for filtered ion channel recordings: flickering event detection. IEEE Transactions on NanoBioscience 17(3), 300–320.

hilde, lowpassFilter, createLocalList

### Examples

## refinement of an initial fit of the gramicidin A recordings given by gramA
# the used filter
filter <- lowpassFilter(type = "bessel", param = list(pole = 4L, cutoff = 1e3 / 1e4),
sr = 1e4)
# initial fit, good on larger temporal scales, but misses short events
# with given q to save computation time
# this q is specific to length of the data and the filter
fit <- jsmurf(gramA, filter = filter, family = "jsmurfPS", q = 1.775696, startTime = 9,
locationCorrection = "none")

# improvement on small temporal scales by testing for short events and local deconvolution
# this call requires a Monte-Carlo simulation
# and therefore might last a few minutes,
# progress of the Monte-Carlo simulation is reported
deconvolution <- improveSmallScales(fit, data = gramA, method = "LR", filter = filter,
startTime = 9, messages = 100)

# any second call should be much faster
# as the previous Monte-Carlo simulation will be loaded
# return fit after each refinement
every <- improveSmallScales(fit, data = gramA, method = "LR", filter = filter,
startTime = 9, output = "every")
deconvolutionEvery <- every[[3]]
attr(deconvolutionEvery, "noDeconvolution") <- attr(every, "noDeconvolution")
attr(deconvolutionEvery, "q") <- attr(every, "q")
identical(deconvolution, deconvolutionEvery)

# identical to a direct idealisation by hilde
compare <- deconvolution
attr(compare, "q") <- NULL
identical(hilde(gramA, filter = filter, family = "jsmurfPS", method = "LR",
startTime = 9), compare)

# allowing heterogeneous noise
fitH <- jsmurf(gramA, filter = filter, family = "hjsmurf", r = 100, startTime = 9,
locationCorrection = "none")
improveSmallScales(fitH, data = gramA, method = "2Param", filter = filter,
startTime = 9, messages = 10, r = 100)
# r = 100 is used to reduce its run time,
# this is okay for illustration purposes, but for precise results
# a larger number of Monte-Carlo simulations is recommend

## zoom into a single event,
## similar to (Pein et al., 2018, Figure 2 lower left panel)
time <- 9 + seq(along = gramA) / filter$sr # time points plot(time, gramA, pch = 16, col = "grey30", ylim = c(20, 50), xlim = c(10.40835, 10.4103), ylab = "Conductance in pS", xlab = "Time in s") # deconvolution lines(deconvolution, col = "red", lwd = 3) # deconvolution convolved with the filter ind <- seq(10.408, 10.411, 1e-6) convolvedSignal <- lowpassFilter::getConvolution(ind, deconvolution, filter) lines(ind, convolvedSignal, col = "blue", lwd = 3) # for comparison, fit prior to the improvement step # does not contain the event and hence fits the recorded data points badly # fit lines(fit, col = "orange", lwd = 3) # fit convolved with the filter ind <- seq(10.408, 10.411, 1e-6) convolvedSignal <- lowpassFilter::getConvolution(ind, fit, filter) lines(ind, convolvedSignal, col = "darkgreen", lwd = 3) # with less regularisation of the correlation matrix improveSmallScales(fit, data = gramA, method = "LR", filter = filter, startTime = 9, messages = 100, regularization = 0.5) # with estimation of the level of long segments by the mean # but requiring 30 observations for it improveSmallScales(fit, data = gramA, method = "LR", filter = filter, startTime = 9, messages = 100, localValue = mean, thresholdLongSegment = 30) # with one refinement step less, but with a larger grid # test are performed on less lengths # progress of the deconvolution is reported # potential warning for no deconvolution is suppressed improveSmallScales(fit, data = gramA, method = "LR", filter = filter, startTime = 9, messages = 100, lengths = c(3:5, 8, 11, 16, 20), gridSize = c(1 / filter$sr, 1 / 10 / filter\$sr),
windowFactorRefinement = 2, report = TRUE,
suppressWarningNoDeconvolution = TRUE)

# pre-computation of quantities using createLocalList
# this saves run time if improveSmallScales (or hilde) is called more than once
localList <- createLocalList(filter = filter, method = "LR")
identical(improveSmallScales(fit, data = gramA, method = "LR", filter = filter,
startTime = 9, localList = localList), deconvolution)


[Package clampSeg version 1.1-1 Index]