sd3 {fromo} | R Documentation |
Compute first K moments
Description
Compute the (standardized) 2nd through kth moments, the mean, and the number of elements.
Usage
sd3(v, na_rm = FALSE, wts = NULL, sg_df = 1, check_wts = FALSE,
normalize_wts = TRUE)
skew4(v, na_rm = FALSE, wts = NULL, sg_df = 1, check_wts = FALSE,
normalize_wts = TRUE)
kurt5(v, na_rm = FALSE, wts = NULL, sg_df = 1, check_wts = FALSE,
normalize_wts = TRUE)
cent_moments(v, max_order = 5L, used_df = 0L, na_rm = FALSE, wts = NULL,
check_wts = FALSE, normalize_wts = TRUE)
std_moments(v, max_order = 5L, used_df = 0L, na_rm = FALSE, wts = NULL,
check_wts = FALSE, normalize_wts = TRUE)
cent_cumulants(v, max_order = 5L, used_df = 0L, na_rm = FALSE,
wts = NULL, check_wts = FALSE, normalize_wts = TRUE)
std_cumulants(v, max_order = 5L, used_df = 0L, na_rm = FALSE,
wts = NULL, check_wts = FALSE, normalize_wts = TRUE)
Arguments
v |
a vector |
na_rm |
whether to remove NA, false by default. |
wts |
an optional vector of weights. Weights are ‘replication’
weights, meaning a value of 2 is shorthand for having two observations
with the corresponding |
sg_df |
the number of degrees of freedom consumed in the computation of the variance or standard deviation. This defaults to 1 to match the ‘Bessel correction’. |
check_wts |
a boolean for whether the code shall check for negative weights, and throw an error when they are found. Default false for speed. |
normalize_wts |
a boolean for whether the weights should be
renormalized to have a mean value of 1. This mean is computed over elements
which contribute to the moments, so if |
max_order |
the maximum order of the centered moment to be computed. |
used_df |
the number of degrees of freedom consumed, used in the denominator of the centered moments computation. These are subtracted from the number of observations. |
Details
Computes the number of elements, the mean, and the 2nd through kth
centered standardized moment, for k=2,3,4
. These
are computed via the numerically robust one-pass method of Bennett et. al.
In general they will not match exactly with the 'standard'
implementations, due to differences in roundoff.
These methods are reasonably fast, on par with the 'standard' implementations. However, they will usually be faster than calling the various standard implementations more than once.
Moments are computed as follows, given some values x_i
and optional weights w_i
,
defaulting to 1, the weighted mean is computed as
\mu = \frac{\sum_i x_i w_i}{\sum w_i}.
The weighted kth central sum is computed as
\mu = \sum_i \left(x_i - \mu\right)^k w_i.
Let n = \sum_i w_i
be the sum of weights (or number of observations in the unweighted case).
Then the weighted kth central moment is computed as that weighted sum divided by the
adjusted sum weights:
\mu_k = \frac{\sum_i \left(x_i - \mu\right)^k w_i}{n - \nu},
where \nu
is the ‘used df’, provided by the user to adjust the denominator.
(Typical values are 0 or 1.)
The weighted kth standardized moment is the central moment divided by the second central moment
to the k/2
power:
\tilde{\mu}_k = \frac{\mu_k}{\mu_2^{k/2}}.
The (centered) rth cumulant, for r \ge 2
is then computed using the formula of Willink, namely
\kappa_r = \mu_r - \sum_{j=0}^{r - 2} {r - 1 \choose j} \mu_j \kappa {r-j}.
The standardized rth cumulant is the rth centered cumulant divided by \mu_2^{r/2}
.
Value
a vector, filled out as follows:
- sd3
A vector of the (sample) standard devation, mean, and number of elements (or the total weight when
wts
are given).- skew4
A vector of the (sample) skewness, standard devation, mean, and number of elements (or the total weight when
wts
are given).- kurt5
A vector of the (sample) excess kurtosis, skewness, standard devation, mean, and number of elements (or the total weight when
wts
are given).- cent_moments
A vector of the (sample)
k
th centered moment, thenk-1
th centered moment, ..., then the variance, the mean, and number of elements (total weight whenwts
are given).- std_moments
A vector of the (sample)
k
th standardized (and centered) moment, thenk-1
th, ..., then standard devation, mean, and number of elements (total weight).- cent_cumulants
A vector of the (sample)
k
th (centered, but this is redundant) cumulant, then thek-1
th, ..., then the variance (which is the second cumulant), then the mean, then the number of elements (total weight).- std_cumulants
A vector of the (sample)
k
th standardized (and centered, but this is redundant) cumulant, then thek-1
th, ..., down to the third, then the variance, the mean, then the number of elements (total weight).
Note
The first centered (and standardized) moment is often defined to be identically 0. Instead cent_moments
and std_moments
returns the mean.
Similarly, the second standardized moments defined to be identically 1; std_moments
instead returns the standard
deviation. The reason is that a user can always decide to ignore the results and fill in a 0 or 1 as they need, but
could not efficiently compute the mean and standard deviation from scratch if we discard it.
The antepenultimate element of the output of std_cumulants
is not a one, even though that ‘should’ be
the standardized second cumulant.
The antepenultimate element of the output of cent_moments
, cent_cumulants
and std_cumulants
is the variance,
not the standard deviation. All other code return the standard deviation in that place.
The kurtosis is excess kurtosis, with a 3 subtracted, and should be nearly zero for Gaussian input.
The term 'centered cumulants' is redundant. The intent was to avoid possible collision with existing code named 'cumulants'.
The moment computations provided by fromo are numerically robust, but will often not provide the same results as the 'standard' implementations, due to differences in roundoff. We make every attempt to balance speed and robustness. User assumes all risk from using the fromo package.
Note that when weights are given, they are treated as replication weights.
This can have subtle effects on computations which require minimum
degrees of freedom, since the sum of weights will be compared to
that minimum, not the number of data points. Weight values
(much) less than 1 can cause computations to return NA
somewhat unexpectedly due to this condition, while values greater
than one might cause the computation to spuriously return a value
with little precision.
Author(s)
Steven E. Pav shabbychef@gmail.com
References
Terriberry, T. "Computing Higher-Order Moments Online." http://people.xiph.org/~tterribe/notes/homs.html
J. Bennett, et. al., "Numerically Stable, Single-Pass, Parallel Statistics Algorithms," Proceedings of IEEE International Conference on Cluster Computing, 2009. https://www.semanticscholar.org/paper/Numerically-stable-single-pass-parallel-statistics-Bennett-Grout/a83ed72a5ba86622d5eb6395299b46d51c901265
Cook, J. D. "Accurately computing running variance." http://www.johndcook.com/standard_deviation.html
Cook, J. D. "Comparing three methods of computing standard deviation." http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation
Willink, R. "Relationships Between Central Moments and Cumulants, with Formulae for the Central Moments of Gamma Distributions." Communications in Statistics - Theory and Methods, 32 no 4 (2003): 701-704. https://doi.org/10.1081/STA-120018823
Examples
x <- rnorm(1e5)
sd3(x)[1] - sd(x)
skew4(x)[4] - length(x)
skew4(x)[3] - mean(x)
skew4(x)[2] - sd(x)
if (require(moments)) {
skew4(x)[1] - skewness(x)
}
# check 'robustness'; only the mean should change:
kurt5(x + 1e12) - kurt5(x)
# check speed
if (require(microbenchmark) && require(moments)) {
dumbk <- function(x) { c(kurtosis(x) - 3.0,skewness(x),sd(x),mean(x),length(x)) }
set.seed(1234)
x <- rnorm(1e6)
print(kurt5(x) - dumbk(x))
microbenchmark(dumbk(x),kurt5(x),times=10L)
}
y <- std_moments(x,6)
cml <- cent_cumulants(x,6)
std <- std_cumulants(x,6)
# check that skew matches moments::skewness
if (require(moments)) {
set.seed(1234)
x <- rnorm(1000)
resu <- fromo::skew4(x)
msku <- moments::skewness(x)
stopifnot(abs(msku - resu[1]) < 1e-14)
}
# check skew vs e1071 skewness, which has a different denominator
if (require(e1071)) {
set.seed(1234)
x <- rnorm(1000)
resu <- fromo::skew4(x)
esku <- e1071::skewness(x,type=3)
nobs <- resu[4]
stopifnot(abs(esku - resu[1] * ((nobs-1)/nobs)^(3/2)) < 1e-14)
# similarly:
resu <- fromo::std_moments(x,max_order=3,used_df=0)
stopifnot(abs(esku - resu[1] * ((nobs-1)/nobs)^(3/2)) < 1e-14)
}