svymean_winsorized {robsurvey}R Documentation

Weighted Winsorized Mean and Total

Description

Weighted winsorized mean and total

Usage

svymean_winsorized(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE,
                   trim_var = FALSE)
svymean_k_winsorized(x, design, k, na.rm = FALSE, trim_var = FALSE)
svytotal_winsorized(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE,
                    trim_var = FALSE)
svytotal_k_winsorized(x, design, k, na.rm = FALSE, trim_var = FALSE)

Arguments

x

a one-sided [formula], e.g., ~myVariable.

design

an object of class survey.design; see svydesign.

LB

[double] lower bound of winsorization such that 0 \leq LB < UB \leq 1.

UB

[double] upper bound of winsorization such that 0 \leq LB < UB \leq 1.

na.rm

[logical] indicating whether NA values should be removed before the computation proceeds (default: FALSE).

trim_var

[logical] indicating whether the variance should be approximated by the variance estimator of the trimmed mean/ total (default: FALSE).

k

[integer] number of observations to be winsorized at the top of the distribution.

Details

Package survey must be attached to the search path in order to use the functions (see library or require).

Characteristic.

Population mean or total. Let \mu denote the estimated winsorized population mean; then, the estimated winsorized total is given by \hat{N} \mu with \hat{N} =\sum w_i, where summation is over all observations in the sample.

Modes of winsorization.

The amount of winsorization can be specified in relative or absolute terms:

  • Relative: By specifying LB and UB, the method winsorizes the LB~\cdot 100\% of the smallest observations and the (1 - UB)~\cdot 100\% of the largest observations from the data.

  • Absolute: By specifying argument k in the functions with the "infix" _k_ in their name (e.g., svymean_k_winsorized), the largest k observations are winsorized, 0<k<n, where n denotes the sample size. E.g., k = 2 implies that the largest and the second largest observation are winsorized.

Variance estimation.

Large-sample approximation based on the influence function; see Huber and Ronchetti (2009, Chap. 3.3) and Shao (1994). Two estimators are available:

simple_var = FALSE

Variance estimator of the winsorized mean/ total. The estimator depends on the estimated probability density function evaluated at the winsorization thresholds, which can be – depending on the context – numerically unstable. As a remedy, a simplified variance estimator is available by setting simple_var = TRUE.

simple_var = TRUE

Variance is approximated using the variance estimator of the trimmed mean/ total.

Utility functions.

summary, coef, SE, vcov, residuals, fitted and robweights.

Bare-bone functions.

See:

Value

Object of class svystat_rob

References

Huber, P. J. and Ronchetti, E. (2009). Robust Statistics, New York: John Wiley and Sons, 2nd edition. doi:10.1002/9780470434697

Shao, J. (1994). L-Statistics in Complex Survey Problems. The Annals of Statistics 22, 976–967. doi:10.1214/aos/1176325505

See Also

Overview (of all implemented functions)

weighted_mean_winsorized, weighted_mean_k_winsorized, weighted_total_winsorized and weighted_total_k_winsorized

Examples

head(workplace)

library(survey)
# Survey design for stratified simple random sampling without replacement
dn <- if (packageVersion("survey") >= "4.2") {
        # survey design with pre-calibrated weights
        svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
                  data = workplace, calibrate.formula = ~-1 + strat)
    } else {
        # legacy mode
        svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
                  data = workplace)
    }

# Estimated winsorized population mean (5% symmetric winsorization)
svymean_winsorized(~employment, dn, LB = 0.05)

# Estimated one-sided k winsorized population total (2 observations are
# winsorized at the top of the distribution)
svytotal_k_winsorized(~employment, dn, k = 2)

[Package robsurvey version 0.6 Index]