R: Pareto tail modeling for income distributions

paretoTail {laeken}

R Documentation

Pareto tail modeling for income distributions

Description

Fit a Pareto distribution to the upper tail of income data. Since a theoretical distribution is used for the upper tail, this is a semiparametric approach.

Usage

paretoTail(
  x,
  k = NULL,
  x0 = NULL,
  method = "thetaPDC",
  groups = NULL,
  w = NULL,
  alpha = 0.01,
  ...
)

Arguments

`x`	a numeric vector.
`k`	the number of observations in the upper tail to which the Pareto distribution is fitted.
`x0`	the threshold (scale parameter) above which the Pareto distribution is fitted.
`method`	either a function or a character string specifying the function to be used to estimate the shape parameter of the Pareto distibution, such as `thetaPDC` (the default). See “Details” for requirements for such a function and “See also” for available functions.
`groups`	an optional vector or factor specifying groups of elements of `x` (e.g., households). If supplied, each group of observations is expected to have the same value in `x` (e.g., household income). Only the values of every first group member to appear are used for fitting the Pareto distribution.
`w`	an optional numeric vector giving sample weights.
`alpha`	numeric; values above the theoretical `1 -` `alpha` quantile of the fitted Pareto distribution will be flagged as outliers for further treatment with `reweightOut` or `replaceOut`.
`...`	addtional arguments to be passed to the specified method.

Details

The arguments k and x0 of course correspond with each other. If k is supplied, the threshold x0 is estimated with the n - k largest value in x, where n is the number of observations. On the other hand, if the threshold x0 is supplied, k is given by the number of observations in x larger than x0. Therefore, either k or x0 needs to be supplied. If both are supplied, only k is used.

The function supplied to method should take a numeric vector (the observations) as its first argument. If k is supplied, it will be passed on (in this case, the function is required to have an argument called k). Similarly, if the threshold x0 is supplied, it will be passed on (in this case, the function is required to have an argument called x0). As above, only k is passed on if both are supplied. If the function specified by method can handle sample weights, the corresponding argument should be called w. Additional arguments are passed via the ... argument.

Value

An object of class "paretoTail" with the following components:

`x`	the supplied numeric vector.
`k`	the number of observations in the upper tail to which the Pareto distribution has been fitted.
`groups`	if supplied, the vector or factor specifying groups of elements.
`w`	if supplied, the numeric vector of sample weights.
`method`	the function used to estimate the shape parameter, or the name of the function.
`x0`	the scale parameter.
`theta`	the estimated shape parameter.
`tail`	if `groups` is not `NULL`, this gives the groups with values larger than the threshold (scale parameter), otherwise the indices of observations in the upper tail.
`alpha`	the tuning parameter `alpha` used for flagging outliers.
`out`	if `groups` is not `NULL`, this gives the groups that are flagged as outliers, otherwise the indices of the flagged observations.

Author(s)

Andreas Alfons

References

A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15

A. Alfons, M. Templ, P. Filzmoser (2013) Robust estimation of economic indicators from survey samples based on Pareto tail modeling. Journal of the Royal Statistical Society, Series C, 62(2), 271–286.

Examples

data(eusilc)


## gini coefficient without Pareto tail modeling
gini("eqIncome", weights = "rb050", data = eusilc)


## gini coefficient with Pareto tail modeling

# estimate threshold
ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090,
    groups = eusilc$db030)

# estimate shape parameter
fit <- paretoTail(eusilc$eqIncome, k = ts$k,
    w = eusilc$db090, groups = eusilc$db030)

# calibration of outliers
w <- reweightOut(fit, calibVars(eusilc$db040))
gini(eusilc$eqIncome, w)

# winsorization of outliers
eqIncome <- shrinkOut(fit)
gini(eqIncome, weights = eusilc$rb050)

# replacement of outliers
eqIncome <- replaceOut(fit)
gini(eqIncome, weights = eusilc$rb050)

# replacement of whole tail
eqIncome <- replaceTail(fit)
gini(eqIncome, weights = eusilc$rb050)

[Package laeken version 0.5.3 Index]