calib_el {jointCalib}R Documentation

An internal function for calibration of weights using empirical likelihood method

Description

calib_el performs calibration using empirical likelihood (EL) method. The function is taken from Wu (2005), if algorithm has problem with convergence codes from Zhang, Han and Wu (2022) using constrOptim is used.

In (pseudo) EL the following (pseudo) EL function is maximized

\[\sum_{i \in r} d_i\log(p_i),\]

under the following constraint

\[\sum_{i \in r} p_i = 1,\]

with constraints on quantiles (with notation as in Harms and Duchesne (2006))

\[\sum_{i \in r} p_i(a_{i} - \alpha/N) = 0,\]

where \(a_{i}\) is created using joint_calib_create_matrix function, and possibly means

\[\sum_{i \in r} p_i(x_{i} - \mu_{x}) = 0,\]

where \(\mu_{x}\) is known population mean of X. For simplicity of notation we assume only one quantile and one mean is known. This can be generalized to multiple quantiles and means.

Usage

calib_el(X, d, totals, maxit = 50, tol = 1e-08, eps = .Machine$double.eps, ...)

Arguments

X

matrix of variables for calibration of quantiles and totals (first column should be intercept),

d

initial d-weights for calibration (e.g. design-weights),

totals

vector of totals (where 1 element is the population size),

maxit

a numeric value giving the maximum number of iterations,

tol

the desired accuracy for the iterative procedure,

eps

the desired accuracy for computing the Moore-Penrose generalized inverse (see MASS::ginv()),

...

arguments passed to stats::optim via stats::constrOptim.

Value

Returns a vector of empirical likelihood g-weights

Author(s)

Maciej Beręsewicz based on Wu (2005) and Zhang, Han and Wu (2022)

References

Wu, C. (2005). Algorithms and R codes for the pseudo empirical likelihood method in survey sampling. Survey Methodology, 31(2), 239 (code is taken from https://sas.uwaterloo.ca/~cbwu/Rcodes/LagrangeM2.txt).

Zhang, S., Han, P., and Wu, C. (2023) Calibration Techniques Encompassing Survey Sampling, Missing Data Analysis and Causal Inference. International Statistical Review, 91: 165–192. https://doi.org/10.1111/insr.12518 (code is taken from Supplementary Materials).

Examples

## generate data based on Haziza and Lesage (2016)
set.seed(123)
N <- 1000
x <- runif(N, 0, 80)
y <- exp(-0.1 + 0.1*x) + rnorm(N, 0, 300)
p <- rbinom(N, 1, prob = exp(-0.2 - 0.014*x))
totals_known <- c(N=N, x=sum(x))
df <- data.frame(x, y, p)
df_resp <- df[df$p == 1, ]
df_resp$d <- N/nrow(df_resp)
res <- calib_el(X = model.matrix(~x, df_resp),
                d = df_resp$d,
                totals = totals_known)
data.frame(known = totals_known, estimated=colSums(res*df_resp$d*model.matrix(~x, df_resp)))


[Package jointCalib version 0.1.0 Index]