iginindex {giniVarCI} | R Documentation |
Gini index for infinite populations and different estimation methods.
Description
Estimates the Gini index in infinite populations, using different methods.
Usage
iginindex(
y,
method = 5L,
bias.correction = TRUE,
cum.sums = NULL,
na.rm = TRUE,
useRcpp = TRUE
)
Arguments
y |
A vector with the non-negative real numbers to be used for estimating the Gini index. This argument can be missing if argument |
method |
An integer between 1 and 10 selecting one of the 10 methods detailed below for estimating the Gini index in infinite populations. The default method is |
bias.correction |
A 'TRUE/FALSE' logical value indicating whether the bias correction should be applied to the estimation of the Gini index. The default value is |
cum.sums |
A vector with the non-negative real numbers specifying the cumulative sums of the variable used to estimate the Gini index. This argument can be |
na.rm |
A 'TRUE/FALSE' logical value indicating whether |
useRcpp |
A 'TRUE/FALSE' logical value indicating whether |
Details
For a sample S
, with size n
, derived from an infinite population, different formulations of the Gini index have been proposed in the literature, but they only provide two different outputs.
This function estimates the Gini index using the various formulations, and both R
and C++
codes are implemented. This can be useful for research purposes, and speed comparisons can be made. The argument cum.sums
does not require that the cumulative sums are based on the non-decreasing order of the variable y
.
The different methods for estimating the Gini index are (see Wang et al., 2016; Giorgi and Gigliarano, 2017; Mukhopadhyay and Sengupta, 2021; Muñoz et al., 2023):
method = 1
\widehat{G}_1 = \displaystyle \frac{1}{2\overline{y}n^{2}}\sum_{i \in S}\sum_{j\in S} |y_i-y_j|;
\widehat{G}_{1}^{bc} = \displaystyle \frac{1}{2\overline{y}n(n-1)}\sum_{i \in S} \sum_{j \in S} |y_i-y_j|,
where \overline{y} = n^{-1}\sum_{i \in S}y_i
is the sample mean and the label bc
indicates that the bias correction is applied to the estimation of the Gini index.
method = 2
\widehat{G}_{2} = \displaystyle \frac{n-1}{n}\frac{\sum_{i=1}^{n-1}(p_i-q_i)}{\sum_{i=1}^{n-1}pi};
\widehat{G}_{2}^{bc} = \displaystyle \frac{\sum_{i=1}^{n-1}(p_i-q_i)}{\sum_{i=1}^{n-1}pi},
where
p_i= \displaystyle \frac{i}{n}; \quad q_i= \frac{y_{i}^{+}}{y_{n}^{+}},
and y_{i}^{+}=\sum_{j=1}^{i}y_{(j)}
, with i=\{1,\ldots,n\}
, are the cumulative sums
of the ordered values y_{(i)}
(in non-decreasing order) of the variable of interest y
.
method = 3
\widehat{G}_{3} = \displaystyle \frac{n-1}{n} - \frac{2}{n}\sum_{i=1}^{n-1}q_i;
\widehat{G}_{3}^{bc} = 1 - \displaystyle \frac{2}{n-1}\sum_{i=1}^{n-1}q_i.
method = 4
\widehat{G}_{4} = 1 - \displaystyle \sum_{i=0}^{n-1}(q_{i+1} + q_i)(p_{i+1} - p_i);
\widehat{G}_{4}^{bc} = \displaystyle \frac{n}{n-1}\left[1 - \sum_{i=0}^{n-1}(q_{i+1} + q_i)(p_{i+1} - p_i)\right],
where p_0=q_0=0.
method = 5
\widehat{G}_{5} = \displaystyle \frac{2}{\overline{y}n^{2}}\sum_{i \in S}iy_{(i)} - \frac{n+1}{n};
\widehat{G}_{5}^{bc} = \displaystyle \frac{2}{\overline{y}n(n-1)}\sum_{i \in S}iy_{(i)} - \frac{n+1}{n-1}.
method = 6
\widehat{G}_{6} = \displaystyle \frac{2}{\overline{y}n}cov(i,y_{(i)});
\widehat{G}_{6}^{bc} = \displaystyle \frac{2}{\overline{y}(n-1)}cov(i,y_{(i)}).
method = 7
\widehat{G}_{7} = \displaystyle \frac{1}{\overline{y}n^2}\sum_{i \in S}\sum_{j\in S}|y_i-y_j|\cdot |\widehat{F}_{n}^{\ast}(y_{i})-\widehat{F}_{n}^{\ast}(y_{j})|;
\widehat{G}_{7}^{bc} = \displaystyle \frac{1}{\overline{y}n(n-1)}\sum_{i\in S}\sum_{j \in S}|y_i-y_j|\cdot |\widehat{F}_{n}^{\ast}(y_{i})-\widehat{F}_{n}^{\ast}(y_{j})|,
where
\widehat{F}_{n}^{\ast}(t)= \displaystyle \frac{1}{n}\sum_{i \in S}[\delta(y_i < t) + 0.5\delta(y_i = t)]
is the smooth (mid-point) distribution function.
method = 8
\widehat{G}_{8} = 1 - \displaystyle \frac{1}{\overline{y}n^2}\sum_{i \in S}\sum_{j \in S}min(y_i,y_j);
\widehat{G}_{8}^{bc} = 1 - \displaystyle \frac{1}{\overline{y}n(n-1)}\sum_{i \in S}\sum_{\substack{j \in S\\ j\neq i} }min(y_i,y_j).
method = 9
\widehat{G}_{9} = \displaystyle \frac{2}{\overline{y}n}\sum_{i \in S}y_{i}\widehat{F}_{n}^{\ast}(y_{i}) - 1;
\widehat{G}_{9}^{bc} = \displaystyle \frac{2}{\overline{y}(n-1)}\sum_{i \in S}y_{i}\widehat{F}_{n}^{\ast}(y_{i}) - \frac{n}{n-1}.
method = 10
\widehat{G}_{10} = \displaystyle \frac{n-1}{2\overline{y}n}\binom{n}{2}^{-1}\sum_{i \leq i_{1} < i_{2} \leq n}|y_{i_{1}}-y_{i_{2}}|;
\widehat{G}_{10}^{bc} = \displaystyle \frac{1}{2\overline{y}}\binom{n}{2}^{-1}\sum_{i \leq i_{1} < i_{2} \leq n}|y_{i_{1}}-y_{i_{2}}|.
Value
A single numeric value between 0 and 1 containing the estimation of the Gini index based on the vector y
or the vector cum.sums
.
Author(s)
Juan F Munoz jfmunoz@ugr.es
Jose M Pavia pavia@uv.es
Encarnacion Alvarez encarniav@ugr.es
References
Giorgi, G. M., and Gigliarano, C. (2017). The Gini concentration index: a review of the inference literature. Journal of Economic Surveys, 31(4), 1130-1148.
Mukhopadhyay, N., and Sengupta, P. P. (Eds.). (2021). Gini inequality index: Methods and applications. CRC press.
Muñoz, J. F., Moya-Fernández, P. J., and Álvarez-Verdejo, E. (2023). Exploring and Correcting the Bias in the Estimation of the Gini Measure of Inequality. Sociological Methods & Research. https://doi.org/10.1177/00491241231176847
Wang, D., Zhao, Y., and Gilmore, D. W. (2016). Jackknife empirical likelihood confidence interval for the Gini index. Statistics & Probability Letters, 110, 289-295.
See Also
Examples
# Sample, with size 50, from a Lognormal distribution. The true Gini index is 0.5.
set.seed(123)
y <- gsample(n = 50, gini = 0.5, meanlog = 5)
# Estimation of the Gini index using the method = 5, bias correction, and Rcpp.
iginindex(y)
# Estimation of the Gini index using the method = 5, bias correction, and R.
iginindex(y, useRcpp = FALSE)
#Comparing the computation time for the various estimation methods and using R
microbenchmark::microbenchmark(
iginindex(y, method = 1, useRcpp = FALSE),
iginindex(y, method = 2, useRcpp = FALSE),
iginindex(y, method = 3, useRcpp = FALSE),
iginindex(y, method = 4, useRcpp = FALSE),
iginindex(y, method = 5, useRcpp = FALSE),
iginindex(y, method = 6, useRcpp = FALSE),
iginindex(y, method = 7, useRcpp = FALSE),
iginindex(y, method = 8, useRcpp = FALSE),
iginindex(y, method = 9, useRcpp = FALSE),
iginindex(y, method = 10, useRcpp = FALSE)
)
# Comparing the computation time for the various estimation methods and using Rcpp
microbenchmark::microbenchmark(
iginindex(y, method = 1),
iginindex(y, method = 2),
iginindex(y, method = 3),
iginindex(y, method = 4),
iginindex(y, method = 5),
iginindex(y, method = 6),
iginindex(y, method = 7),
iginindex(y, method = 8),
iginindex(y, method = 9),
iginindex(y, method = 10) )