running_sd3 {fromo} | R Documentation |
Compute first K moments over a sliding window
Description
Compute the (standardized) 2nd through kth moments, the mean, and the number of elements over an infinite or finite sliding window, returning a matrix.
Usage
running_sd3(v, window = NULL, wts = NULL, na_rm = FALSE, min_df = 0L,
used_df = 1, restart_period = 100L, check_wts = FALSE,
normalize_wts = TRUE)
running_skew4(v, window = NULL, wts = NULL, na_rm = FALSE, min_df = 0L,
used_df = 1, restart_period = 100L, check_wts = FALSE,
normalize_wts = TRUE)
running_kurt5(v, window = NULL, wts = NULL, na_rm = FALSE, min_df = 0L,
used_df = 1, restart_period = 100L, check_wts = FALSE,
normalize_wts = TRUE)
running_sd(v, window = NULL, wts = NULL, na_rm = FALSE, min_df = 0L,
used_df = 1, restart_period = 100L, check_wts = FALSE,
normalize_wts = TRUE)
running_skew(v, window = NULL, wts = NULL, na_rm = FALSE, min_df = 0L,
used_df = 1, restart_period = 100L, check_wts = FALSE,
normalize_wts = TRUE)
running_kurt(v, window = NULL, wts = NULL, na_rm = FALSE, min_df = 0L,
used_df = 1, restart_period = 100L, check_wts = FALSE,
normalize_wts = TRUE)
running_cent_moments(v, window = NULL, wts = NULL, max_order = 5L,
na_rm = FALSE, max_order_only = FALSE, min_df = 0L, used_df = 0,
restart_period = 100L, check_wts = FALSE, normalize_wts = TRUE)
running_std_moments(v, window = NULL, wts = NULL, max_order = 5L,
na_rm = FALSE, min_df = 0L, used_df = 0, restart_period = 100L,
check_wts = FALSE, normalize_wts = TRUE)
running_cumulants(v, window = NULL, wts = NULL, max_order = 5L,
na_rm = FALSE, min_df = 0L, used_df = 0, restart_period = 100L,
check_wts = FALSE, normalize_wts = TRUE)
Arguments
v |
a vector |
window |
the window size. if given as finite integer or double, passed through.
If |
wts |
an optional vector of weights. Weights are ‘replication’
weights, meaning a value of 2 is shorthand for having two observations
with the corresponding |
na_rm |
whether to remove NA, false by default. |
min_df |
the minimum df to return a value, otherwise |
used_df |
the number of degrees of freedom consumed, used in the denominator of the centered moments computation. These are subtracted from the number of observations. |
restart_period |
the recompute period. because subtraction of elements can cause loss of precision, the computation of moments is restarted periodically based on this parameter. Larger values mean fewer restarts and faster, though less accurate results. |
check_wts |
a boolean for whether the code shall check for negative weights, and throw an error when they are found. Default false for speed. |
normalize_wts |
a boolean for whether the weights should be
renormalized to have a mean value of 1. This mean is computed over elements
which contribute to the moments, so if |
max_order |
the maximum order of the centered moment to be computed. |
max_order_only |
for |
Details
Computes the number of elements, the mean, and the 2nd through kth
centered (and typically standardized) moments, for k=2,3,4
. These
are computed via the numerically robust one-pass method of Bennett et. al.
Given the length n
vector x
, we output matrix M
where
M_{i,j}
is the order - j + 1
moment (i.e.
excess kurtosis, skewness, standard deviation, mean or number of elements)
of x_{i-window+1},x_{i-window+2},...,x_{i}
.
Barring NA
or NaN
, this is over a window of size window
.
During the 'burn-in' phase, we take fewer elements.
Value
Typically a matrix, where the first columns are the kth, k-1th through 2nd standardized, centered moments, then a column of the mean, then a column of the number of (non-nan) elements in the input, with the following exceptions:
- running_cent_moments
Computes arbitrary order centered moments. When
max_order_only
is set, only a column of the maximum order centered moment is returned.- running_std_moments
Computes arbitrary order standardized moments, then the standard deviation, the mean, and the count. There is not yet an option for
max_order_only
, but probably should be.- running_cumulants
Computes arbitrary order cumulants, and returns the kth, k-1th, through the second (which is the variance) cumulant, then the mean, and the count.
Note
the kurtosis is excess kurtosis, with a 3 subtracted, and should be nearly zero for Gaussian input.
The moment computations provided by fromo are numerically robust, but will often not provide the same results as the 'standard' implementations, due to differences in roundoff. We make every attempt to balance speed and robustness. User assumes all risk from using the fromo package.
Note that when weights are given, they are treated as replication weights.
This can have subtle effects on computations which require minimum
degrees of freedom, since the sum of weights will be compared to
that minimum, not the number of data points. Weight values
(much) less than 1 can cause computations to return NA
somewhat unexpectedly due to this condition, while values greater
than one might cause the computation to spuriously return a value
with little precision.
As this code may add and remove observations, numerical imprecision
may result in negative estimates of squared quantities, like the
second or fourth moments. We do not currently correct for this
issue, although it may be somewhat mitigated by setting a smaller
restart_period
. In the future we will add a check for
this case. Post an issue if you experience this bug.
Author(s)
Steven E. Pav shabbychef@gmail.com
References
Terriberry, T. "Computing Higher-Order Moments Online." http://people.xiph.org/~tterribe/notes/homs.html
J. Bennett, et. al., "Numerically Stable, Single-Pass, Parallel Statistics Algorithms," Proceedings of IEEE International Conference on Cluster Computing, 2009. https://www.semanticscholar.org/paper/Numerically-stable-single-pass-parallel-statistics-Bennett-Grout/a83ed72a5ba86622d5eb6395299b46d51c901265
Cook, J. D. "Accurately computing running variance." http://www.johndcook.com/standard_deviation.html
Cook, J. D. "Comparing three methods of computing standard deviation." http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation
Examples
x <- rnorm(1e5)
xs3 <- running_sd3(x,10)
xs4 <- running_skew4(x,10)
if (require(moments)) {
set.seed(123)
x <- rnorm(5e1)
window <- 10L
kt5 <- running_kurt5(x,window=window)
rm1 <- t(sapply(seq_len(length(x)),function(iii) {
xrang <- x[max(1,iii-window+1):iii]
c(moments::kurtosis(xrang)-3.0,moments::skewness(xrang),
sd(xrang),mean(xrang),length(xrang)) },
simplify=TRUE))
stopifnot(max(abs(kt5 - rm1),na.rm=TRUE) < 1e-12)
}
xc6 <- running_cent_moments(x,window=100L,max_order=6L)