improveSmallScales {clampSeg} | R Documentation |

Implements the second and third step of HILDE (Pein et al., 2020). It refines an initial fit, e.g. obtained by `jsmurf`

on small temporal scales by testing for events and local deconvolution. Refinment can be done assuming homogeneous noise, but also allow heterogeneous noise. Please see the argument `method`

and the examples for how to access the function correctly depending on whether homogeneous noise is assumed or heterogeneous noise is allowed. Further details about how to decide whether the noise is homogeneous or heterogeneous and whether events are short are given in the accompanying vignette.

If `q == NULL`

a Monte-Carlo simulation is required for computing critical values. Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, the package saves them by default in the workspace and on the file system such that a second call requiring the same Monte-Carlo simulation will be much faster. For more details, in particular to which arguments the Monte-Carlo simulations are specific, see Section Storing of Monte-Carlo simulations below. Progress of a Monte-Carlo simulation can be reported by the argument `messages`

and the saving can be controlled by the argument `option`

, both can be specified in `...`

and are explained in `getCritVal`

.

```
improveSmallScales(fit, data, filter, method = c("2Param", "LR"),
lengths = NULL, q = NULL, alpha = 0.04, r = 1e3, startTime = 0,
thresholdLongSegment = if (method == "LR") 10L else 25L,
localValue = stats::median,
localVar = function(data) stepR::sdrobnorm(data,
lag = filter$len + 1L)^2,
regularization = 1,
gridSize = c(1, 1 / 10, 1 / 100) / filter$sr,
windowFactorRefinement = 1,
output = c("onlyIdealization", "everyGrid"), report = FALSE,
suppressWarningNoDeconvolution = FALSE,
localList = NULL, ...)
```

`fit` |
an |

`data` |
a numeric vector containing the recorded data points |

`filter` |
an object of class |

`method` |
the testing method for short events, |

`lengths` |
a vector of integers giving the lengths on which tests will be performed to detect short events, should be chosen such that events on larger scales are already contained in |

`q` |
a numeric vector of the same length as |

`alpha` |
a probability, i.e. a single numeric between 0 and 1, giving the significance level to compute the critical values (if |

`r` |
an integer giving the number of Monte-Carlo simulations (if |

`startTime` |
a single numeric giving the time at which recording (sampling) of |

`thresholdLongSegment` |
a single integer giving the threshold determining how many observations are necessary to estimate parameters, conductance value and its variance, without deconvolution; has to be chosen such that |

`localValue` |
a function for estimating the conductance levels on long segments, see details, will be called with |

`localVar` |
a function for estimating the variance on long segments, see details, will be called with |

`regularization` |
a single positive numeric or a numeric vector with positive entries or a |

`gridSize` |
a numeric vector giving the size of the grids in the iterative grid search, see details |

`windowFactorRefinement` |
a single numeric or a numeric vector of length |

`output` |
a string specifying the return type, see Value |

`report` |
a single |

`suppressWarningNoDeconvolution` |
a single |

`localList` |
an object of class |

`...` |
additional parameters to be passed to |

First of al, tests for additional short events are performed. Those tests take into account the lowpass filter explicitly. Afterwards all conductance levels (of the newly found and of the already existing event) and locations of the conductance changes are determined by local deconvolution. The local deconvolution consists of two parts.

In the first part, all segments of the initial fit, potentially interrupted by newly detected events, will be divided into long and short ones. The first and lasts `filter$len`

data points of each segment will be ignored and if the remaining data points `data[i:j]`

are at least `thresholdLongSegment`

, i.e. `j - i + 1 >= thresholdLongSegment`

, the parameters, conductance level and variance, of this segment will be determined by `localValue(data[i:j])`

and `localVar(data[i:j])`

.

The long segments allow in the second part to perform the deconvolution locally by maximizing the likelihood function by an iterative grid search. Three scenarios might occur: Two long segments can follow each other, in this case the change, but no level, has to be estimated by maximizing the likelihood function of only few observations in this single parameter. A single short segment can be in between of two long segments, in this case two changes and one level have to be estimated by maximizing the likelihood function of only few observations in these three parameters. Finally, two short segments can follow each other, in this case no deconvolution is performed and the initial parameters are returned for these segments together with entries in the `"noDeconvolution"`

`attribute`

. More precisely, let `i:j`

be the short segments, then `i:j`

will be added to the `"noDeconvolution"`

`attribute`

and for the idealisation (if `output == "everyGrid"`

this applies for each entry) the entries `value[i:j]`

, `leftEnd[i:(j + 1)]`

and `rightEnd[(i - 1):j]`

are kept from the initial fit without refinement by deconvolution. If `suppressWarningNoDeconvolution == FALSE`

, additionally, a `warning`

will be given at first occurrence.

Maximisation of the likelihood is performed by minimizing (Pein et al., 2018, (9)), a term of the form `x^T\Sigma x`

, where `\Sigma`

is the regularised correlation matrix and `x`

a numeric vector of the same dimension. More precisely, the (unregularised) correlations are `filter$acf`

, to this the regularisation `regularization`

is added. In detail, if `regularization`

is a numeric, the regularised correlation is

cor <- filter$acf cor[seq(along = regularization)] <- cor[seq(along = regularization)] + regularization

and if `regularization`

is a list the same, but `regularization`

is in the `i-th`

refinement replaced by `regularization[[i]]`

. Then, `\Sigma`

is a symmetric Toeplitz matrix with entries `cor`

, i.e. a matrix with `cor[1]`

on the main diagonal, `cor[2]`

on the second diagonal, etc. and `0`

for all entries outside of the first `length(cor)`

diagonals.

The minimisations are performed by an iterative grid search: In a first step potential changes will be allowed to be at the grid / time points `seq(cp - filter$len / filter$sr, cp, gridSize[1])`

, with `cp`

the considered change of the initial fit. For each grid point in case of a single change and for each combination of grid points in case of two changes the term in (9) is computed and the change(s) for which the minimum is attained is / are chosen. Afterwards, refinements are done with the grids

seq(cp - windowFactorRefinement[j - 1] * gridSize[j - 1], cp + windowFactorRefinement[j - 1] * gridSize[j - 1], gridSize[j]),

with `cp`

the change of the iteration before, as long as entries in `gridSize`

are given.

The idealisation (fit, regression) obtained by testing for short events and local deconvolution. If `output == "onlyIdealization"`

an object of class `stepblock`

containing the final idealisation obtained by local deconvolution. If `output == "everyGrid"`

a `list`

of length `length(gridSize)`

containing the idealisation after each refining step. Additionally, in both cases, an `attribute`

`"noDeconvolution"`

, an integer vector, gives the segments for which no deconvolution could be performed, since two short segments followed each other, see details. Moreover, the (computed) `q`

is returned as an `attribute`

.

If `q == NULL`

a Monte-Carlo simulation is required to compute the critical values. Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, multiple possibilities for saving and loading the simulations are offered. Progress of a simulation can be reported by the argument `messages`

which can be specified in `...`

and is explained in the documentation of `getCritVal`

. Each Monte-Carlo simulation is specific to test `method`

, the number of observations and the used filter. Monte-Carlo simulations are also specific to the arguments `thresholdLongSegment`

, `localValue`

and `localVar`

. Currently, storing a Monte-Carlo simulation is only possible for their default values. Note, that also Monte-Carlo simulations for a (slightly) larger number of observations `n_q`

, given in the argument `nq`

in `...`

and explained in the documentation of `getCritVal`

, can be used, which avoids extensive resimulations for only a little bit varying number of observations, but results in a (small) loss of power. However, simulations of type `"matrixIncreased"`

, i.e. objects of class `"MCSimulationVector"`

with `nq`

observations, have to be resimulated if `as.integer(log2(n1)) != as.integer(log2(n2))`

when the saved simulation was computed with `n == n1`

and the simulation now is required for `n == n2`

and `nq >= n1`

and `nq >= n2`

. Simulations can either be saved in the workspace in the variable `critValStepRTab`

or persistently on the file system for which the package `R.cache`

is used. Moreover, storing in and loading from variables and RDS files is supported. The simulation, saving and loading can be controlled by the argument `option`

which can be specified in `...`

and is explained in the documentation of `getCritVal`

. By default simulations will be saved in the workspace and on the file system. For more details and for how simulation can be removed see Section Simulating, saving and loading of Monte-Carlo simulations in `getCritVal`

.

Pein, F., Bartsch, A., Steinem, C., Munk, A. (2020) Heterogeneous Idealization of Ion Channel Recordings - Open Channel Noise. *arXiv*:2008.02658.

Pein, F., Tecuapetla-Gómez, I., Schütte, O., Steinem, C., Munk, A. (2018) Fully-automatic multiresolution idealization for filtered ion channel recordings: flickering event detection. *IEEE Transactions on NanoBioscience* **17**(3), 300–320.

`hilde`

, `lowpassFilter`

, `createLocalList`

```
## refinement of an initial fit of the gramicidin A recordings given by gramA
# the used filter
filter <- lowpassFilter(type = "bessel", param = list(pole = 4L, cutoff = 1e3 / 1e4),
sr = 1e4)
# initial fit, good on larger temporal scales, but misses short events
# with given q to save computation time
# this q is specific to length of the data and the filter
fit <- jsmurf(gramA, filter = filter, family = "jsmurfPS", q = 1.775696, startTime = 9,
locationCorrection = "none")
# improvement on small temporal scales by testing for short events and local deconvolution
# this call requires a Monte-Carlo simulation
# and therefore might last a few minutes,
# progress of the Monte-Carlo simulation is reported
deconvolution <- improveSmallScales(fit, data = gramA, method = "LR", filter = filter,
startTime = 9, messages = 100)
# any second call should be much faster
# as the previous Monte-Carlo simulation will be loaded
# return fit after each refinement
every <- improveSmallScales(fit, data = gramA, method = "LR", filter = filter,
startTime = 9, output = "every")
deconvolutionEvery <- every[[3]]
attr(deconvolutionEvery, "noDeconvolution") <- attr(every, "noDeconvolution")
attr(deconvolutionEvery, "q") <- attr(every, "q")
identical(deconvolution, deconvolutionEvery)
# identical to a direct idealisation by hilde
compare <- deconvolution
attr(compare, "q") <- NULL
identical(hilde(gramA, filter = filter, family = "jsmurfPS", method = "LR",
startTime = 9), compare)
# allowing heterogeneous noise
fitH <- jsmurf(gramA, filter = filter, family = "hjsmurf", r = 100, startTime = 9,
locationCorrection = "none")
improveSmallScales(fitH, data = gramA, method = "2Param", filter = filter,
startTime = 9, messages = 10, r = 100)
# r = 100 is used to reduce its run time,
# this is okay for illustration purposes, but for precise results
# a larger number of Monte-Carlo simulations is recommend
## zoom into a single event,
## similar to (Pein et al., 2018, Figure 2 lower left panel)
time <- 9 + seq(along = gramA) / filter$sr # time points
plot(time, gramA, pch = 16, col = "grey30", ylim = c(20, 50),
xlim = c(10.40835, 10.4103), ylab = "Conductance in pS", xlab = "Time in s")
# deconvolution
lines(deconvolution, col = "red", lwd = 3)
# deconvolution convolved with the filter
ind <- seq(10.408, 10.411, 1e-6)
convolvedSignal <- lowpassFilter::getConvolution(ind, deconvolution, filter)
lines(ind, convolvedSignal, col = "blue", lwd = 3)
# for comparison, fit prior to the improvement step
# does not contain the event and hence fits the recorded data points badly
# fit
lines(fit, col = "orange", lwd = 3)
# fit convolved with the filter
ind <- seq(10.408, 10.411, 1e-6)
convolvedSignal <- lowpassFilter::getConvolution(ind, fit, filter)
lines(ind, convolvedSignal, col = "darkgreen", lwd = 3)
# with less regularisation of the correlation matrix
improveSmallScales(fit, data = gramA, method = "LR", filter = filter,
startTime = 9, messages = 100, regularization = 0.5)
# with estimation of the level of long segments by the mean
# but requiring 30 observations for it
improveSmallScales(fit, data = gramA, method = "LR", filter = filter,
startTime = 9, messages = 100,
localValue = mean, thresholdLongSegment = 30)
# with one refinement step less, but with a larger grid
# test are performed on less lengths
# progress of the deconvolution is reported
# potential warning for no deconvolution is suppressed
improveSmallScales(fit, data = gramA, method = "LR", filter = filter,
startTime = 9, messages = 100,
lengths = c(3:5, 8, 11, 16, 20),
gridSize = c(1 / filter$sr, 1 / 10 / filter$sr),
windowFactorRefinement = 2, report = TRUE,
suppressWarningNoDeconvolution = TRUE)
# pre-computation of quantities using createLocalList
# this saves run time if improveSmallScales (or hilde) is called more than once
localList <- createLocalList(filter = filter, method = "LR")
identical(improveSmallScales(fit, data = gramA, method = "LR", filter = filter,
startTime = 9, localList = localList), deconvolution)
```

[Package *clampSeg* version 1.1-1 Index]