R: Compute first K moments

sd3 {fromo}

R Documentation

Compute first K moments

Description

Compute the (standardized) 2nd through kth moments, the mean, and the number of elements.

Usage

sd3(v, na_rm = FALSE, wts = NULL, sg_df = 1, check_wts = FALSE,
  normalize_wts = TRUE)

skew4(v, na_rm = FALSE, wts = NULL, sg_df = 1, check_wts = FALSE,
  normalize_wts = TRUE)

kurt5(v, na_rm = FALSE, wts = NULL, sg_df = 1, check_wts = FALSE,
  normalize_wts = TRUE)

cent_moments(v, max_order = 5L, used_df = 0L, na_rm = FALSE, wts = NULL,
  check_wts = FALSE, normalize_wts = TRUE)

std_moments(v, max_order = 5L, used_df = 0L, na_rm = FALSE, wts = NULL,
  check_wts = FALSE, normalize_wts = TRUE)

cent_cumulants(v, max_order = 5L, used_df = 0L, na_rm = FALSE,
  wts = NULL, check_wts = FALSE, normalize_wts = TRUE)

std_cumulants(v, max_order = 5L, used_df = 0L, na_rm = FALSE,
  wts = NULL, check_wts = FALSE, normalize_wts = TRUE)

Arguments

`v`	a vector
`na_rm`	whether to remove NA, false by default.
`wts`	an optional vector of weights. Weights are ‘replication’ weights, meaning a value of 2 is shorthand for having two observations with the corresponding `v` value. If `NULL`, corresponds to equal unit weights, the default. Note that weights are typically only meaningfully defined up to a multiplicative constant, meaning the units of weights are immaterial, with the exception that methods which check for minimum df will, in the weighted case, check against the sum of weights. For this reason, weights less than 1 could cause `NA` to be returned unexpectedly due to the minimum condition. When weights are `NA`, the same rules for checking `v` are applied. That is, the observation will not contribute to the moment if the weight is `NA` when `na_rm` is true. When there is no checking, an `NA` value will cause the output to be `NA`.
`sg_df`	the number of degrees of freedom consumed in the computation of the variance or standard deviation. This defaults to 1 to match the ‘Bessel correction’.
`check_wts`	a boolean for whether the code shall check for negative weights, and throw an error when they are found. Default false for speed.
`normalize_wts`	a boolean for whether the weights should be renormalized to have a mean value of 1. This mean is computed over elements which contribute to the moments, so if `na_rm` is set, that means non-NA elements of `wts` that correspond to non-NA elements of the data vector.
`max_order`	the maximum order of the centered moment to be computed.
`used_df`	the number of degrees of freedom consumed, used in the denominator of the centered moments computation. These are subtracted from the number of observations.

Details

Computes the number of elements, the mean, and the 2nd through kth centered standardized moment, for k=2,3,4. These are computed via the numerically robust one-pass method of Bennett et. al. In general they will not match exactly with the 'standard' implementations, due to differences in roundoff.

These methods are reasonably fast, on par with the 'standard' implementations. However, they will usually be faster than calling the various standard implementations more than once.

Moments are computed as follows, given some values x_i and optional weights w_i, defaulting to 1, the weighted mean is computed as

\mu = \frac{\sum_i x_i w_i}{\sum w_i}.

The weighted kth central sum is computed as

\mu = \sum_i \left(x_i - \mu\right)^k w_i.

Let n = \sum_i w_i be the sum of weights (or number of observations in the unweighted case). Then the weighted kth central moment is computed as that weighted sum divided by the adjusted sum weights:

\mu_k = \frac{\sum_i \left(x_i - \mu\right)^k w_i}{n - \nu},

where \nu is the ‘used df’, provided by the user to adjust the denominator. (Typical values are 0 or 1.) The weighted kth standardized moment is the central moment divided by the second central moment to the k/2 power:

\tilde{\mu}_k = \frac{\mu_k}{\mu_2^{k/2}}.

The (centered) rth cumulant, for r \ge 2 is then computed using the formula of Willink, namely

\kappa_r = \mu_r - \sum_{j=0}^{r - 2} {r - 1 \choose j} \mu_j \kappa {r-j}.

The standardized rth cumulant is the rth centered cumulant divided by \mu_2^{r/2}.

Value

a vector, filled out as follows:

sd3: A vector of the (sample) standard devation, mean, and number of elements (or the total weight when wts are given).
skew4: A vector of the (sample) skewness, standard devation, mean, and number of elements (or the total weight when wts are given).
kurt5: A vector of the (sample) excess kurtosis, skewness, standard devation, mean, and number of elements (or the total weight when wts are given).
cent_moments: A vector of the (sample) kth centered moment, then k-1th centered moment, ..., then the variance, the mean, and number of elements (total weight when wts are given).
std_moments: A vector of the (sample) kth standardized (and centered) moment, then k-1th, ..., then standard devation, mean, and number of elements (total weight).
cent_cumulants: A vector of the (sample) kth (centered, but this is redundant) cumulant, then the k-1th, ..., then the variance (which is the second cumulant), then the mean, then the number of elements (total weight).
std_cumulants: A vector of the (sample) kth standardized (and centered, but this is redundant) cumulant, then the k-1th, ..., down to the third, then the variance, the mean, then the number of elements (total weight).

Note

The first centered (and standardized) moment is often defined to be identically 0. Instead cent_moments and std_moments returns the mean. Similarly, the second standardized moments defined to be identically 1; std_moments instead returns the standard deviation. The reason is that a user can always decide to ignore the results and fill in a 0 or 1 as they need, but could not efficiently compute the mean and standard deviation from scratch if we discard it. The antepenultimate element of the output of std_cumulants is not a one, even though that ‘should’ be the standardized second cumulant.

The antepenultimate element of the output of cent_moments, cent_cumulants and std_cumulants is the variance, not the standard deviation. All other code return the standard deviation in that place.

The kurtosis is excess kurtosis, with a 3 subtracted, and should be nearly zero for Gaussian input.

The term 'centered cumulants' is redundant. The intent was to avoid possible collision with existing code named 'cumulants'.

The moment computations provided by fromo are numerically robust, but will often not provide the same results as the 'standard' implementations, due to differences in roundoff. We make every attempt to balance speed and robustness. User assumes all risk from using the fromo package.

Note that when weights are given, they are treated as replication weights. This can have subtle effects on computations which require minimum degrees of freedom, since the sum of weights will be compared to that minimum, not the number of data points. Weight values (much) less than 1 can cause computations to return NA somewhat unexpectedly due to this condition, while values greater than one might cause the computation to spuriously return a value with little precision.

Author(s)

Steven E. Pav shabbychef@gmail.com

References

Terriberry, T. "Computing Higher-Order Moments Online." http://people.xiph.org/~tterribe/notes/homs.html

J. Bennett, et. al., "Numerically Stable, Single-Pass, Parallel Statistics Algorithms," Proceedings of IEEE International Conference on Cluster Computing, 2009. https://www.semanticscholar.org/paper/Numerically-stable-single-pass-parallel-statistics-Bennett-Grout/a83ed72a5ba86622d5eb6395299b46d51c901265

Cook, J. D. "Accurately computing running variance." http://www.johndcook.com/standard_deviation.html

Cook, J. D. "Comparing three methods of computing standard deviation." http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation

Willink, R. "Relationships Between Central Moments and Cumulants, with Formulae for the Central Moments of Gamma Distributions." Communications in Statistics - Theory and Methods, 32 no 4 (2003): 701-704. https://doi.org/10.1081/STA-120018823

Examples

x <- rnorm(1e5)
sd3(x)[1] - sd(x)
skew4(x)[4] - length(x)
skew4(x)[3] - mean(x)
skew4(x)[2] - sd(x)
if (require(moments)) {
  skew4(x)[1] - skewness(x)
}


# check 'robustness'; only the mean should change:
kurt5(x + 1e12) - kurt5(x)
# check speed
if (require(microbenchmark) && require(moments)) {
  dumbk <- function(x) { c(kurtosis(x) - 3.0,skewness(x),sd(x),mean(x),length(x)) }
  set.seed(1234)
  x <- rnorm(1e6)
  print(kurt5(x) - dumbk(x))
  microbenchmark(dumbk(x),kurt5(x),times=10L)
}
y <- std_moments(x,6)
cml <- cent_cumulants(x,6)
std <- std_cumulants(x,6)

# check that skew matches moments::skewness
if (require(moments)) {
    set.seed(1234)
    x <- rnorm(1000)
    resu <- fromo::skew4(x)

    msku <- moments::skewness(x)
    stopifnot(abs(msku - resu[1]) < 1e-14)
}

# check skew vs e1071 skewness, which has a different denominator
if (require(e1071)) {
    set.seed(1234)
    x <- rnorm(1000)
    resu <- fromo::skew4(x)

    esku <- e1071::skewness(x,type=3)
    nobs <- resu[4]
    stopifnot(abs(esku - resu[1] * ((nobs-1)/nobs)^(3/2)) < 1e-14)

    # similarly:
    resu <- fromo::std_moments(x,max_order=3,used_df=0)
    stopifnot(abs(esku - resu[1] * ((nobs-1)/nobs)^(3/2)) < 1e-14)
}

[Package fromo version 0.2.1 Index]