R: Bias correction via iterative bootstrap

ib {ib}

R Documentation

Bias correction via iterative bootstrap

Description

ib is used to correct the bias of a fitted model object with the iterative bootstrap procedure.

Usage

ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...)

## S4 method for signature 'betareg'
ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...)

## S4 method for signature 'glm'
ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...)

## S4 method for signature 'lm'
ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...)

## S4 method for signature 'lmerMod'
ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...)

## S4 method for signature 'nls'
ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...)

## S4 method for signature 'vglm'
ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...)

Arguments

`object`	an `object` representing a fitted model (see 'Details').
`thetastart`	an optional starting value for the iterative procedure. If `NULL` (default), the procedure starts at the estimates in `object`.
`control`	a `list` of parameters for controlling the iterative procedure (see `ibControl`).
`extra_param`	if `TRUE`, the bias of estimation of extra parameters is performed (see 'Details').
`...`	additional optional arguments (see 'Details').

Details

The iterative bootstrap procedure is described in Kuk (1995) and further studied by Guerrier et al. (2019) and Guerrier et al. (2020). The kth iteration of this algorithm is

\hat{\theta}^{k} = \hat{\theta}^{k-1} + \hat{\pi} - \frac{1}{H}\sum_{h=1}^H\hat{\pi}_h(\hat{\theta}^{k-1})

for k=1,2,\ldots and where the sum is over h=1,\ldots,H. The estimate \hat{\pi} is provided by the object. The value \hat{\pi}_h(\hat{\theta}) is a parametric bootstrap estimate where the bootstrap sample is generated from \hat{\theta} and a fixed seed (see ibControl). The greater the parameter value H generally the better bias correction but the more computation it requires (see ibControl). If thetastart=NULL, the initial value of the procedure is \hat{\theta}^{0}=\hat{\pi}. The number of iterations are controlled by maxit parameter of ibControl.

By default, the method correct coefficients only. For extra parameters, it depends on the model. These extra parameters may have some constraints (e.g. positivity). If constraint=TRUE (see ibControl), then a transformation from the constraint space to the real is used for the update.

For betareg, extra_param is not available as by default mean and precision parameters are corrected. Currently the 'identity' link function is not supported for precision parameters.

For glm, if extra_param=TRUE: the shape parameter for the Gamma, the variance of the residuals in lm or the overdispersion parameter of the negative binomial regression in glm.nb, are also corrected. Note that the quasi families are not supported for the moment as they have no simulation method (see simulate). Bias correction for extra parameters of the inverse.gaussian is not yet implemented.

For lm, if extra_param=TRUE: the variance of the residuals is also corrected. Note that using the ib is not useful as coefficients are already unbiased, unless one considers different data generating mechanism such as censoring, missing values and outliers (see ibControl).

For lmer, by default, only the fixed effects are corrected. If extra_param=TRUE: all the random effects (variances and correlations) and the variance of the residuals are also corrected. Note that using the ib is certainly not useful with the argument REML=TRUE in lmer as the bias of variance components is already addressed, unless one considers different data generating mechanism such as censoring, missing values and outliers (see ibControl).

For nls, if extra_param=TRUE: the variance of the residuals is also corrected.

For vglm, extra_param is currently not used. Indeed, the philosophy of a vector generalized linear model is to potentially model all parameters of a distribution with a linear predictor. Hence, what would be considered as an extra parameter in glm for instance, may already be captured by the default coefficients. However, correcting the bias of a coefficients does not imply that the bias of the parameter of the distribution is corrected (by Jensen's inequality), so we may use this feature in a future version of the package. Note that we currently only support distributions with a simslot (see simulate.vlm).

Value

A fitted model object of class Ib.

Author(s)

Samuel Orso

References

Guerrier S, Dupuis-Lozeron E, Ma Y, Victoria-Feser M (2019). “Simulation-Based Bias Correction Methods for Complex Models.” Journal of the American Statistical Association, 114(525), 146-157. doi: 10.1080/01621459.2017.1380031, https://doi.org/10.1080/01621459.2017.1380031.

Guerrier S, Karemera M, Orso S, Victoria-Feser M, Zhang Y (2020). “A General Approach for Simulation-based Bias Correction in High Dimensional Settings.” https://arxiv.org/pdf/2010.13687.pdf. Version 2: 13 Nov 2020, 2010.13687, https://arxiv.org/pdf/2010.13687.pdf.

Kuk AYC (1995). “Asymptotically Unbiased Estimation in Generalized Linear Models with Random Effects.” Journal of the Royal Statistical Society: Series B (Methodological), 57(2), 395-407. doi: 10.1111/j.2517-6161.1995.tb02035.x, https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.2517-6161.1995.tb02035.x, https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1995.tb02035.x.

Examples


## beta regression
library(betareg)
data("GasolineYield", package = "betareg")
## currently link.phi = "identity" is not supported
## fit_beta <- betareg(yield ~ batch + temp, data = GasolineYield)
fit_beta <- betareg(yield ~ batch + temp, link.phi = "log", data = GasolineYield)
fit_ib <- ib(fit_beta)

# precision parameter can also depend on covariates
fit_beta <- betareg(yield ~ batch + temp | temp, data = GasolineYield)
fit_ib <- ib(fit_beta)
## poisson regression
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
pois_fit <- glm(counts ~ outcome + treatment, family = poisson())
fit_ib <- ib(pois_fit)
summary(fit_ib)
## Set H = 1000
## Not run: 
fit_ib <- ib(pois_fit, control=list(H=1000))
summary(fit_ib)

## End(Not run)

## gamma regression
clotting <- data.frame(
  u = c(5,10,15,20,30,40,60,80,100),
  lot1 = c(118,58,42,35,27,25,21,19,18),
  lot2 = c(69,35,26,21,18,16,13,12,12))
fit_gamma <- glm(lot2 ~ log(u), data = clotting, family = Gamma(link = "inverse"))
fit_ib <- ib(fit_gamma)
## summary(fit_ib)
## correct for shape parameter and show iterations
## Not run: 
fit_ib <- ib(fit_gamma, control=list(verbose=TRUE), extra_param = TRUE)
summary(fit_ib)

## End(Not run)

## negative binomial regression
library(MASS)
fit_nb <- glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = quine)
fit_ib <- ib(fit_nb)
## summary(fit_ib)
## correct for overdispersion with H=100
## Not run: 
fit_ib <- ib(fit_nb, control=list(H=100), extra_param = TRUE)
summary(fit_ib)

## End(Not run)

## linear regression
fit_lm <- lm(disp ~ cyl + hp + wt, data = mtcars)
fit_ib <- ib(fit_lm)
summary(fit_ib)
## correct for variance of residuals
fit_ib <- ib(fit_lm, extra_param = TRUE)
summary(fit_ib)

## linear mixed-effects regression
library(lme4)
fit_lmm <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy, REML = FALSE)
fit_ib <- ib(fit_lmm)
summary(fit_ib)
## correct for variances and correlation
## Not run: 
fit_ib <- ib(fit_lmm, extra_param = TRUE)
summary(fit_ib)

## End(Not run)

## nonlinear regression
DNase1 <- subset(DNase, Run == 1)
fit_nls <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), data = DNase1)
fit_ib <- ib(fit_nls)
summary(fit_ib)

## student regression
library(VGAM)
tdata <- data.frame(x = runif(nn <- 1000))
tdata <- transform(tdata,
                   y = rt(nn, df = exp(exp(0.5 - x))))
fit_vglm <- vglm(y ~ x, studentt3, data = tdata)
fit_ib <- ib(fit_vglm)
summary(fit_ib)

[Package ib version 0.2.0 Index]