running_centered {fromo} | R Documentation |
Compare data to moments computed over a sliding window.
Description
Computes moments over a sliding window, then adjusts the data accordingly, centering, or scaling, or z-scoring, and so on.
Usage
running_centered(v, window = NULL, wts = NULL, na_rm = FALSE,
min_df = 0L, used_df = 1, lookahead = 0L, restart_period = 100L,
check_wts = FALSE, normalize_wts = FALSE)
running_scaled(v, window = NULL, wts = NULL, na_rm = FALSE, min_df = 0L,
used_df = 1, lookahead = 0L, restart_period = 100L, check_wts = FALSE,
normalize_wts = TRUE)
running_zscored(v, window = NULL, wts = NULL, na_rm = FALSE,
min_df = 0L, used_df = 1, lookahead = 0L, restart_period = 100L,
check_wts = FALSE, normalize_wts = TRUE)
running_sharpe(v, window = NULL, wts = NULL, na_rm = FALSE,
compute_se = FALSE, min_df = 0L, used_df = 1, restart_period = 100L,
check_wts = FALSE, normalize_wts = TRUE)
running_tstat(v, window = NULL, wts = NULL, na_rm = FALSE, min_df = 0L,
used_df = 1, restart_period = 100L, check_wts = FALSE,
normalize_wts = TRUE)
Arguments
v |
a vector |
window |
the window size. if given as finite integer or double, passed through.
If |
wts |
an optional vector of weights. Weights are ‘replication’
weights, meaning a value of 2 is shorthand for having two observations
with the corresponding |
na_rm |
whether to remove NA, false by default. |
min_df |
the minimum df to return a value, otherwise |
used_df |
the number of degrees of freedom consumed, used in the denominator of the centered moments computation. These are subtracted from the number of observations. |
lookahead |
for some of the operations, the value is compared to mean and standard deviation possibly using 'future' or 'past' information by means of a non-zero lookahead. Positive values mean data are taken from the future. |
restart_period |
the recompute period. because subtraction of elements can cause loss of precision, the computation of moments is restarted periodically based on this parameter. Larger values mean fewer restarts and faster, though less accurate results. |
check_wts |
a boolean for whether the code shall check for negative weights, and throw an error when they are found. Default false for speed. |
normalize_wts |
a boolean for whether the weights should be
renormalized to have a mean value of 1. This mean is computed over elements
which contribute to the moments, so if |
compute_se |
for |
Details
Given the length n
vector x
, for
a given index i
, define x^{(i)}
as the vector of
x_{i-window+1},x_{i-window+2},...,x_{i}
,
where we do not run over the 'edge' of the vector. In code, this is essentially
x[(max(1,i-window+1)):i]
. Then define \mu_i
, \sigma_i
and n_i
as, respectively, the sample mean, standard deviation and number of
non-NA elements in x^{(i)}
.
We compute output vector m
the same size as x
.
For the 'centered' version of x
, we have m_i = x_i - \mu_i
.
For the 'scaled' version of x
, we have m_i = x_i / \sigma_i
.
For the 'z-scored' version of x
, we have m_i = (x_i - \mu_i) / \sigma_i
.
For the 't-scored' version of x
, we have m_i = \sqrt{n_i} \mu_i / \sigma_i
.
We also allow a 'lookahead' for some of these operations.
If positive, the moments are computed using data from larger indices;
if negative, from smaller indices. Letting j = i + lookahead
:
For the 'centered' version of x
, we have m_i = x_i - \mu_j
.
For the 'scaled' version of x
, we have m_i = x_i / \sigma_j
.
For the 'z-scored' version of x
, we have m_i = (x_i - \mu_j) / \sigma_j
.
Value
a vector the same size as the input consisting of the adjusted version of the input.
When there are not sufficient (non-nan) elements for the computation, NaN
are returned.
Note
The moment computations provided by fromo are numerically robust, but will often not provide the same results as the 'standard' implementations, due to differences in roundoff. We make every attempt to balance speed and robustness. User assumes all risk from using the fromo package.
Note that when weights are given, they are treated as replication weights.
This can have subtle effects on computations which require minimum
degrees of freedom, since the sum of weights will be compared to
that minimum, not the number of data points. Weight values
(much) less than 1 can cause computations to return NA
somewhat unexpectedly due to this condition, while values greater
than one might cause the computation to spuriously return a value
with little precision.
Author(s)
Steven E. Pav shabbychef@gmail.com
References
Terriberry, T. "Computing Higher-Order Moments Online." http://people.xiph.org/~tterribe/notes/homs.html
J. Bennett, et. al., "Numerically Stable, Single-Pass, Parallel Statistics Algorithms," Proceedings of IEEE International Conference on Cluster Computing, 2009. https://www.semanticscholar.org/paper/Numerically-stable-single-pass-parallel-statistics-Bennett-Grout/a83ed72a5ba86622d5eb6395299b46d51c901265
Cook, J. D. "Accurately computing running variance." http://www.johndcook.com/standard_deviation.html
Cook, J. D. "Comparing three methods of computing standard deviation." http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation
See Also
Examples
if (require(moments)) {
set.seed(123)
x <- rnorm(5e1)
window <- 10L
rm1 <- t(sapply(seq_len(length(x)),function(iii) {
xrang <- x[max(1,iii-window+1):iii]
c(sd(xrang),mean(xrang),length(xrang)) },
simplify=TRUE))
rcent <- running_centered(x,window=window)
rscal <- running_scaled(x,window=window)
rzsco <- running_zscored(x,window=window)
rshrp <- running_sharpe(x,window=window)
rtsco <- running_tstat(x,window=window)
rsrse <- running_sharpe(x,window=window,compute_se=TRUE)
stopifnot(max(abs(rcent - (x - rm1[,2])),na.rm=TRUE) < 1e-12)
stopifnot(max(abs(rscal - (x / rm1[,1])),na.rm=TRUE) < 1e-12)
stopifnot(max(abs(rzsco - ((x - rm1[,2]) / rm1[,1])),na.rm=TRUE) < 1e-12)
stopifnot(max(abs(rshrp - (rm1[,2] / rm1[,1])),na.rm=TRUE) < 1e-12)
stopifnot(max(abs(rtsco - ((sqrt(rm1[,3]) * rm1[,2]) / rm1[,1])),na.rm=TRUE) < 1e-12)
stopifnot(max(abs(rsrse[,1] - rshrp),na.rm=TRUE) < 1e-12)
rm2 <- t(sapply(seq_len(length(x)),function(iii) {
xrang <- x[max(1,iii-window+1):iii]
c(kurtosis(xrang)-3.0,skewness(xrang)) },
simplify=TRUE))
mertens_se <- sqrt((1 + ((2 + rm2[,1])/4) * rshrp^2 - rm2[,2]*rshrp) / rm1[,3])
stopifnot(max(abs(rsrse[,2] - mertens_se),na.rm=TRUE) < 1e-12)
}