brglmFit {brglm2}R Documentation

Fitting function for glm() for reduced-bias estimation and inference

Description

brglmFit() is a fitting method for glm() that fits generalized linear models using implicit and explicit bias reduction methods (Kosmidis, 2014), and other penalized maximum likelihood methods. Currently supported methods include the mean bias-reducing adjusted scores approach in Firth (1993) and Kosmidis & Firth (2009), the median bias-reduction adjusted scores approach in Kenne Pagui et al. (2017), the correction of the asymptotic bias in Cordeiro & McCullagh (1991), the mixed bias-reduction adjusted scores approach in Kosmidis et al (2020), maximum penalized likelihood with powers of the Jeffreys prior as penalty, and maximum likelihood. Estimation is performed using a quasi Fisher scoring iteration (see vignette("iteration", "brglm2"), which, in the case of mean-bias reduction, resembles an iterative correction of the asymptotic bias of the Fisher scoring iterates.

Usage

brglmFit(
  x,
  y,
  weights = rep(1, nobs),
  start = NULL,
  etastart = NULL,
  mustart = NULL,
  offset = rep(0, nobs),
  family = gaussian(),
  control = list(),
  intercept = TRUE,
  fixed_totals = NULL,
  singular.ok = TRUE
)

brglm_fit(
  x,
  y,
  weights = rep(1, nobs),
  start = NULL,
  etastart = NULL,
  mustart = NULL,
  offset = rep(0, nobs),
  family = gaussian(),
  control = list(),
  intercept = TRUE,
  fixed_totals = NULL,
  singular.ok = TRUE
)

Arguments

x

a design matrix of dimension n * p.

y

a vector of observations of length n.

weights

an optional vector of ‘prior weights’ to be used in the fitting process. Should be NULL or a numeric vector.

start

starting values for the parameters in the linear predictor. If NULL (default) then the maximum likelihood estimates are calculated and used as starting values.

etastart

applied only when start is not NULL. Starting values for the linear predictor to be passed to glm.fit() when computing starting values using maximum likelihood.

mustart

applied only when start is not NULL. Starting values for the vector of means to be passed to glm.fit() when computing starting values using maximum likelihood.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See model.offset.

family

a description of the error distribution and link function to be used in the model. For glm this can be a character string naming a family function, a family function or the result of a call to a family function. For glm.fit only the third option is supported. (See family for details of family functions.)

control

a list of parameters controlling the fitting process. See brglmControl() for details.

intercept

logical. Should an intercept be included in the null model?

fixed_totals

effective only when family is poisson(). Either NULL (no effect) or a vector that indicates which counts must be treated as a group. See Details for more information and brmultinom().

singular.ok

logical. If FALSE, a singular model is an error.

Details

A detailed description of the supported adjustments and the quasi Fisher scoring iteration is given in the iteration vignette (see, vignette("iteration", "brglm2") or Kosmidis et al, 2020). A shorter description of the quasi Fisher scoring iteration is also given in one of the vignettes of the enrichwith R package (see, https://cran.r-project.org/package=enrichwith/vignettes/bias.html). Kosmidis and Firth (2010) describe a parallel quasi Newton-Raphson iteration with the same stationary point.

In the special case of generalized linear models for binomial, Poisson and multinomial responses, the adjusted score equation approaches for type = "AS_mixed", type = "AS_mean", and type = "AS_median" (see below for what methods each type corresponds) return estimates with improved frequentist properties, that are also always finite, even in cases where the maximum likelihood estimates are infinite (e.g. complete and quasi-complete separation in multinomial regression). See, Kosmidis and Firth (2021) for a proof for binomial-response GLMs with Jeffreys-prior penalties to the log-likelihood, which is equivalent to mean bias reduction for logistic regression. See, also, detectseparation::detect_separation() and detectseparation::check_infinite_estimates() for pre-fit and post-fit methods for the detection of infinite estimates in binomial response generalized linear models.

The type of score adjustment to be used is specified through the type argument (see brglmControl() for details). The available options are

The null deviance is evaluated based on the fitted values using the method specified by the type argument (see brglmControl()).

The family argument of the current version of brglmFit() can accept any combination of "family" objects and link functions, including families with user-specified link functions, mis() links, and power() links, but excluding quasi(), quasipoisson() and quasibinomial() families.

The description of method argument and the ⁠Fitting functions⁠ section in glm() gives information on supplying fitting methods to glm().

fixed_totals specifies groups of observations for which the sum of the means of a Poisson model will be held fixed to the observed count for each group. This argument is used internally in brmultinom() and bracl() for baseline-category logit models and adjacent category logit models, respectively.

brglm_fit() is an alias to brglmFit().

Author(s)

Ioannis Kosmidis ⁠[aut, cre]⁠ ioannis.kosmidis@warwick.ac.uk, Euloge Clovis Kenne Pagui ⁠[ctb]⁠ kenne@stat.unipd.it

References

Kosmidis I, Firth D (2021). Jeffreys-prior penalty, finiteness and shrinkage in binomial-response generalized linear models. Biometrika, 108, 71-82. doi:10.1093/biomet/asaa052.

Kosmidis I, Kenne Pagui E C, Sartori N (2020). Mean and median bias reduction in generalized linear models. Statistics and Computing, 30, 43-59. doi:10.1007/s11222-019-09860-6.

Cordeiro G M, McCullagh P (1991). Bias correction in generalized linear models. Journal of the Royal Statistical Society. Series B (Methodological), 53, 629-643. doi:10.1111/j.2517-6161.1991.tb01852.x.

Firth D (1993). Bias reduction of maximum likelihood estimates. Biometrika. 80, 27-38. doi:10.2307/2336755.

Kenne Pagui E C, Salvan A, Sartori N (2017). Median bias reduction of maximum likelihood estimates. Biometrika, 104, 923–938. doi:10.1093/biomet/asx046.

Kosmidis I, Firth D (2009). Bias reduction in exponential family nonlinear models. Biometrika, 96, 793-804. doi:10.1093/biomet/asp055.

Kosmidis I, Firth D (2010). A generic algorithm for reducing bias in parametric estimation. Electronic Journal of Statistics, 4, 1097-1112. doi:10.1214/10-EJS579.

Kosmidis I (2014). Bias in parametric estimation: reduction and useful side-effects. WIRE Computational Statistics, 6, 185-196. doi:10.1002/wics.1296.

See Also

brglmControl(), glm.fit(), glm()

Examples

## The lizards example from ?brglm::brglm
data("lizards")
# Fit the model using maximum likelihood
lizardsML <- glm(cbind(grahami, opalinus) ~ height + diameter +
                 light + time, family = binomial(logit), data = lizards,
                 method = "glm.fit")
# Mean bias-reduced fit:
lizardsBR_mean <- glm(cbind(grahami, opalinus) ~ height + diameter +
                      light + time, family = binomial(logit), data = lizards,
                      method = "brglmFit")
# Median bias-reduced fit:
lizardsBR_median <- glm(cbind(grahami, opalinus) ~ height + diameter +
                        light + time, family = binomial(logit), data = lizards,
                        method = "brglmFit", type = "AS_median")
summary(lizardsML)
summary(lizardsBR_median)
summary(lizardsBR_mean)

# Maximum penalized likelihood with Jeffreys prior penatly
lizards_Jeffreys <- glm(cbind(grahami, opalinus) ~ height + diameter +
                        light + time, family = binomial(logit), data = lizards,
                        method = "brglmFit", type = "MPL_Jeffreys")
# lizards_Jeffreys is the same fit as lizardsBR_mean (see Firth, 1993)
all.equal(coef(lizardsBR_mean), coef(lizards_Jeffreys))

# Maximum penalized likelihood with powers of the Jeffreys prior as
# penalty. See Kosmidis & Firth (2021) for the finiteness and
# shrinkage properties of the maximum penalized likelihood
# estimators in binomial response models

a <- seq(0, 20, 0.5)
coefs <- sapply(a, function(a) {
      out <- glm(cbind(grahami, opalinus) ~ height + diameter +
             light + time, family = binomial(logit), data = lizards,
             method = "brglmFit", type = "MPL_Jeffreys", a = a)
      coef(out)
})
# Illustration of shrinkage as a grows
matplot(a, t(coefs), type = "l", col = 1, lty = 1)
abline(0, 0, col = "grey")



## Another example from
## King, Gary, James E. Alt, Nancy Elizabeth Burns and Michael Laver
## (1990).  "A Unified Model of Cabinet Dissolution in Parliamentary
## Democracies", _American Journal of Political Science_, **34**, 846-870

data("coalition", package = "brglm2")
# The maximum likelihood fit with log link
coalitionML <- glm(duration ~ fract + numst2, family = Gamma, data = coalition)
# The mean bias-reduced fit
coalitionBR_mean <- update(coalitionML, method = "brglmFit")
# The bias-corrected fit
coalitionBC <- update(coalitionML, method = "brglmFit", type = "correction")
# The median bias-corrected fit
coalitionBR_median <- update(coalitionML, method = "brglmFit", type = "AS_median")



## An example with offsets from Venables & Ripley (2002, p.189)
data("anorexia", package = "MASS")

anorexML <- glm(Postwt ~ Prewt + Treat + offset(Prewt),
                family = gaussian, data = anorexia)
anorexBC <- update(anorexML, method = "brglmFit", type = "correction")
anorexBR_mean <- update(anorexML, method = "brglmFit")
anorexBR_median <- update(anorexML, method = "brglmFit", type = "AS_median")

# All methods return the same estimates for the regression
# parameters because the maximum likelihood estimator is normally
# distributed around the `true` value under the model (hence, both
# mean and component-wise median unbiased). The Wald tests for
# anorexBC and anorexBR_mean differ from anorexML because the
# bias-reduced estimator of the dispersion is the unbiased, by
# degree of freedom adjustment (divide by n - p), estimator of the
# residual variance. The Wald tests from anorexBR_median are based
# on the median bias-reduced estimator of the dispersion that
# results from a different adjustment of the degrees of freedom
# (divide by n - p - 2/3)
summary(anorexML)
summary(anorexBC)
summary(anorexBR_mean)
summary(anorexBR_median)


## endometrial data from Heinze & Schemper (2002) (see ?endometrial)
data("endometrial", package = "brglm2")
endometrialML <- glm(HG ~ NV + PI + EH, data = endometrial,
                     family = binomial("probit"))
endometrialBR_mean <- update(endometrialML, method = "brglmFit",
                             type = "AS_mean")
endometrialBC <- update(endometrialML, method = "brglmFit",
                        type = "correction")
endometrialBR_median <- update(endometrialML, method = "brglmFit",
                               type = "AS_median")
summary(endometrialML)
summary(endometrialBC)
summary(endometrialBR_mean)
summary(endometrialBR_median)


[Package brglm2 version 0.9.2 Index]