R: Function for the joint calibration of totals and quantiles

joint_calib {jointCalib}

R Documentation

Function for the joint calibration of totals and quantiles

Description

joint_calib allows joint calibration of totals and quantiles. It provides a user-friendly interface that includes the specification of variables in formula notation, a vector of population totals, a list of quantiles, and a variety of backends and methods.

Usage

joint_calib(
  formula_totals = NULL,
  formula_quantiles = NULL,
  data = NULL,
  dweights = NULL,
  N = NULL,
  pop_totals = NULL,
  pop_quantiles = NULL,
  subset = NULL,
  backend = c("sampling", "laeken", "survey", "ebal", "base"),
  method = c("raking", "linear", "logit", "sinh", "truncated", "el", "eb"),
  bounds = c(0, 10),
  maxit = 50,
  tol = 1e-08,
  eps = .Machine$double.eps,
  control = control_calib(),
  ...
)

Arguments

`formula_totals`	a formula with variables to calibrate the totals,
`formula_quantiles`	a formula with variables for quantile calibration,
`data`	a data.frame with variables,
`dweights`	initial d-weights for calibration (e.g. design weights),
`N`	population size for calibration of quantiles,
`pop_totals`	a named vector of population totals for `formula_totals`. Should be provided exactly as in `survey` package (see `survey::calibrate`),
`pop_quantiles`	a named list of population quantiles for `formula_quantiles` or an `newsvyquantile` class object (from `survey::svyquantile` function),
`subset`	a formula for subset of data,
`backend`	specify an R package to perform the calibration. Only `sampling`, `laeken`, `survey`, `ebal` or `base` are allowed,
`method`	specify method (i.e. distance function) for the calibration. Only `raking`, `linear`, `logit`, `sinh`, `truncated`, `el` (empirical likelihood), `eb` (entropy balancing) are allowed,
`bounds`	a numeric vector of length two giving bounds for the g-weights,
`maxit`	a numeric value representing the maximum number of iterations,
`tol`	the desired accuracy for the iterative procedure (for `sampling`, `laeken`, `ebal`, `el`) or tolerance in matching population total for `survey::grake` (see help for survey::grake)
`eps`	the desired accuracy for computing the Moore-Penrose generalized inverse (see `MASS::ginv()`)
`control`	a list of control parameters (currently only for `joint_calib_create_matrix`)
`...`	arguments passed either to `sampling::calib`, `laeken::calibWeights`, `survey::calibrate` or `optim::constrOptim`

Value

Returns a list with containing:

g – g-weight that sums up to sample size,
Xs – matrix used for calibration (i.e. Intercept, X and X_q transformed for calibration of quantiles),
totals – a vector of totals (i.e. N, pop_totals and pop_quantiles),
method – selected method,
backend – selected backend.

Author(s)

Maciej Beręsewicz

References

Beręsewicz, M., and Szymkowiak, M. (2023). A note on joint calibration estimators for totals and quantiles Arxiv preprint https://arxiv.org/abs/2308.13281

Deville, J. C., and Särndal, C. E. (1992). Calibration estimators in survey sampling. Journal of the American statistical Association, 87(418), 376-382.

Harms, T. and Duchesne, P. (2006). On calibration estimation for quantiles. Survey Methodology, 32(1), 37.

Wu, C. (2005) Algorithms and R codes for the pseudo empirical likelihood method in survey sampling, Survey Methodology, 31(2), 239.

Zhang, S., Han, P., and Wu, C. (2023) Calibration Techniques Encompassing Survey Sampling, Missing Data Analysis and Causal Inference, International Statistical Review 91, 165–192.

Haziza, D., and Lesage, É. (2016). A discussion of weighting procedures for unit nonresponse. Journal of Official Statistics, 32(1), 129-145.

Examples

## generate data based on Haziza and Lesage (2016)
set.seed(123)
N <- 1000
x <- runif(N, 0, 80)
y <- exp(-0.1 + 0.1*x) + rnorm(N, 0, 300)
p <- rbinom(N, 1, prob = exp(-0.2 - 0.014*x))
probs <- seq(0.1, 0.9, 0.1)
quants_known <- list(x=quantile(x, probs))
totals_known <- c(x=sum(x))
df <- data.frame(x, y, p)
df_resp <- df[df$p == 1, ]
df_resp$d <- N/nrow(df_resp)
y_quant_true <- quantile(y, probs)
## standard calibration for comparison
result0 <- sampling::calib(Xs = cbind(1, df_resp$x),
                           d = df_resp$d,
                           total = c(N, totals_known),
                           method = "linear")

y_quant_hat0 <- laeken::weightedQuantile(x = df_resp$y,
                                         probs = probs,
                                         weights = result0*df_resp$d)
x_quant_hat0 <- laeken::weightedQuantile(x = df_resp$x,
                                         probs = probs,
                                         weights = result0*df_resp$d)

## example 1: calibrate only quantiles (deciles)
result1 <- joint_calib(formula_quantiles = ~x,
                       data = df_resp,
                       dweights = df_resp$d,
                       N = N,
                       pop_quantiles = quants_known,
                       method = "linear",
                       backend = "sampling")
## estimate quantiles
y_quant_hat1 <- laeken::weightedQuantile(x = df_resp$y,
                                         probs = probs,
                                         weights = result1$g*df_resp$d)
x_quant_hat1 <- laeken::weightedQuantile(x = df_resp$x,
                                         probs = probs,
                                         weights = result1$g*df_resp$d)

## compare with known
data.frame(standard = y_quant_hat0, est=y_quant_hat1, true=y_quant_true)

## example 2: calibrate with quantiles (deciles) and totals
result2 <- joint_calib(formula_totals = ~x,
                       formula_quantiles = ~x,
                       data = df_resp,
                       dweights = df_resp$d,
                       N = N,
                       pop_quantiles = quants_known,
                       pop_totals = totals_known,
                       method = "linear",
                       backend = "sampling")
## estimate quantiles
y_quant_hat2 <- laeken::weightedQuantile(x = df_resp$y,
                                         probs = probs,
                                         weights = result2$g*df_resp$d)
x_quant_hat2 <- laeken::weightedQuantile(x = df_resp$x,
                                         probs = probs,
                                         weights = result2$g*df_resp$d)

## compare with known
data.frame(standard = y_quant_hat0, est1=y_quant_hat1,
           est2=y_quant_hat2, true=y_quant_true)

## example 3: calibrate wigh quantiles (deciles) and totals with
## hyperbolic sinus (sinh) and survey package

result3 <- joint_calib(formula_totals = ~x,
                       formula_quantiles = ~x,
                       data = df_resp,
                       dweights = df_resp$d,
                       N = N,
                       pop_quantiles = quants_known,
                       pop_totals = totals_known,
                       method = "sinh",
                       backend = "survey")

## estimate quantiles
y_quant_hat3 <- laeken::weightedQuantile(x = df_resp$y,
                                         probs = probs,
                                         weights = result3$g*df_resp$d)
x_quant_hat3 <- laeken::weightedQuantile(x = df_resp$x,
                                         probs = probs,
                                         weights = result3$g*df_resp$d)

## example 4: calibrate wigh quantiles (deciles) and totals with ebal package
result4 <- joint_calib(formula_totals = ~x,
                       formula_quantiles = ~x,
                       data = df_resp,
                       dweights = df_resp$d,
                       N = N,
                       pop_quantiles = quants_known,
                       pop_totals = totals_known,
                       method = "eb",
                       backend = "ebal")

## estimate quantiles
y_quant_hat4 <- laeken::weightedQuantile(x = df_resp$y,
                                         probs = probs,
                                         weights = result4$g*df_resp$d)
x_quant_hat4 <- laeken::weightedQuantile(x = df_resp$x,
                                         probs = probs,
                                         weights = result4$g*df_resp$d)

## compare with known
data.frame(standard = y_quant_hat0,
           est1=y_quant_hat1,
           est2=y_quant_hat2,
           est3=y_quant_hat3,
           est4=y_quant_hat4,
           true=y_quant_true)
## compare with known X
data.frame(standard = x_quant_hat0,
           est1=x_quant_hat1,
           est2=x_quant_hat2,
           est3=x_quant_hat3,
           est4=x_quant_hat4,
           true = quants_known$x)

[Package jointCalib version 0.1.0 Index]