R: Estimators and confidence intervals of four multivariate...

e_mcv {GFDmcv}

R Documentation

Estimators and confidence intervals of four multivariate coefficients of variation and their reciprocals

Description

Calculates the estimators with respective (1-\alpha)-confidence intervals for the four different variants of the multivariate coefficients (MCV) and their reciprocals by Reyment (1960), Van Valen (1974), Voinov and Nikulin (1996) and Albert and Zhang (2010).

Usage

e_mcv(x, conf_level = 0.95)

Arguments

`x`	a matrix of data of size `n\times d`.
`conf_level`	a confidence level. By default, it is equal to 0.95.

Details

The function e_mcv() calculates four different variants of multivariate coefficient of variation for d-dimensional data. These variant were introduced by by Reyment (1960, RR), Van Valen (1974, VV), Voinov and Nikulin (1996, VN) and Albert and Zhang (2010, AZ):

{\widehat C}^{RR}=\sqrt{\frac{(\det\mathbf{\widehat\Sigma})^{1/d}}{\boldsymbol{\widehat\mu}^{\top}\boldsymbol{\widehat\mu}}},\ {\widehat C}^{VV}=\sqrt{\frac{\mathrm{tr}\mathbf{\widehat\Sigma}}{\boldsymbol{\widehat\mu}^{\top}\boldsymbol{\widehat\mu}}},\ {\widehat C}^{VN}=\sqrt{\frac{1}{\boldsymbol{\widehat\mu}^{\top}\mathbf{\widehat\Sigma}^{-1}\boldsymbol{\widehat\mu}}},\ {\widehat C}^{AZ}=\sqrt{\frac{\boldsymbol{\widehat\mu}^{\top}\mathbf{\widehat\Sigma}\boldsymbol{\widehat\mu}}{(\boldsymbol{\widehat\mu}^{\top}\boldsymbol{\widehat\mu})^2}},

where n is the sample size, \boldsymbol{\widehat\mu} is the empirical mean vector and \mathbf{\widehat \Sigma} is the empirical covariance matrix:

\boldsymbol{\widehat\mu}_i = \frac{1}{n}\sum_{j=1}^{n} \mathbf{X}_{j},\; \mathbf{\widehat \Sigma} =\frac{1}{n}\sum_{j=1}^{n} (\mathbf{X}_{j} - \boldsymbol{\widehat \mu})(\mathbf{X}_{j} - \boldsymbol{\widehat \mu})^{\top}.

In the univariate case (d=1), all four variants reduce to coefficient of variation. Furthermore, their reciprocals, the so-called standardized means, are determined:

{\widehat B}^{RR}=\sqrt{\frac{\boldsymbol{\widehat\mu}^{\top}\boldsymbol{\widehat\mu}}{(\det\mathbf{\widehat\Sigma})^{1/d}}},\ {\widehat B}^{VV}=\sqrt{\frac{\boldsymbol{\widehat\mu}^{\top}\boldsymbol{\widehat\mu}}{\mathrm{tr}\mathbf{\widehat\Sigma}}},\ {\widehat B}^{VN}=\sqrt{\boldsymbol{\widehat\mu}^{\top}\mathbf{\widehat\Sigma}^{-1}\boldsymbol{\widehat\mu}},\ {\widehat B}^{AZ}=\sqrt{\frac{(\boldsymbol{\widehat\mu}^{\top}\boldsymbol{\widehat\mu})^2}{\boldsymbol{\widehat\mu}^{\top}\mathbf{\widehat\Sigma}\boldsymbol{\widehat\mu}}}.

In addition to the estimators, the respective confidence intervals [C_lwr, C_upr] for a given confidence level 1-\alpha are calculated by the e_mcv() function. These confidence intervals are based on an asymptotic approximation by a normal distribution, see Ditzhaus and Smaga (2023) for the technical details. These approximations do not rely on any specific (semi-)parametric assumption on the distribution and are valid nonparametrically, even for tied data.

Value

When d>1 (respectively d=1) a data frame with four rows (one row) corresponding to the four MCVs (the univariate CV) and six columns containing the estimators C_est for the MCV (CV) and the estimators B_est for their reciprocals as well as the upper and lower bounds of the corresponding confidence intervals [C_lwr, C_upr] and [B_lwr, B_upr].

References

Albert A., Zhang L. (2010) A novel definition of the multivariate coefficient of variation. Biometrical Journal 52:667-675.

Ditzhaus M., Smaga L. (2023) Inference for all variants of the multivariate coefficient of variation in factorial designs. Preprint https://arxiv.org/abs/2301.12009.

Reyment R.A. (1960) Studies on Nigerian Upper Cretaceous and Lower Tertiary Ostracoda: part 1. Senonian and Maastrichtian Ostracoda, Stockholm Contributions in Geology, vol 7.

Van Valen L. (1974) Multivariate structural statistics in natural history. Journal of Theoretical Biology 45:235-247.

Voinov V., Nikulin M. (1996) Unbiased Estimators and Their Applications, Vol. 2, Multivariate Case. Kluwer, Dordrecht.

Examples

# d > 1 (MCVs)
data_set <- lapply(list(iris[iris$Species == "setosa", 1:3],
                        iris[iris$Species == "versicolor", 1:3],
                        iris[iris$Species == "virginica", 1:3]),
                   as.matrix)
lapply(data_set, e_mcv)
# d = 1 (CV)
data_set <- lapply(list(iris[iris$Species == "setosa", 1],
                        iris[iris$Species == "versicolor", 1],
                        iris[iris$Species == "virginica", 1]),
                   as.matrix)
lapply(data_set, e_mcv)

[Package GFDmcv version 0.1.0 Index]