R: Empirical likelihood for generalized linear models

el_glm {melt}

R Documentation

Empirical likelihood for generalized linear models

Description

Fits a generalized linear model with empirical likelihood.

Usage

el_glm(
  formula,
  family = gaussian,
  data,
  weights = NULL,
  na.action,
  start = NULL,
  etastart = NULL,
  mustart = NULL,
  offset,
  control = el_control(),
  ...
)

Arguments

`formula`	An object of class `formula` (or one that can be coerced to that class): a symbolic description of the model to be fitted.
`family`	A description of the error distribution and link function to be used in the model. Only the result of a call to a family function is supported. See ‘Details’.
`data`	An optional data frame, list or environment (or object coercible by `as.data.frame()` to a data frame) containing the variables in the formula. If not found in data, the variables are taken from `environment(formula)`.
`weights`	An optional numeric vector of weights to be used in the fitting process. Defaults to `NULL`, corresponding to identical weights. If non-`NULL`, weighted empirical likelihood is computed.
`na.action`	A function which indicates what should happen when the data contain `NA`s. The default is set by the `na.action` setting of `options`, and is `na.fail` if that is unset.
`start`	Starting values for the parameters in the linear predictor. Defaults to `NULL` and is passed to `glm.fit()`.
`etastart`	Starting values for the linear predictor. Defaults to `NULL` and is passed to `glm.fit()`.
`mustart`	Starting values for the vector of means. Defaults to `NULL` and is passed to `glm.fit()`.
`offset`	An optional expression for specifying an a priori known component to be included in the linear predictor during fitting. This should be `NULL` or a numeric vector or matrix of extents matching those of the response. One or more `offset` terms can be included in the formula instead or as well, and if more than one are specified their sum is used.
`control`	An object of class ControlEL constructed by `el_control()`.
`...`	Additional arguments to be passed to `glm.control()`.

Details

Suppose that we observe n independent random variables {Z_i} \equiv {(X_i, Y_i)} from a common distribution, where X_i is the p-dimensional covariate (including the intercept if any) and Y_i is the response. A generalized linear model specifies that {\textrm{E}(Y_i | X_i)} = {\mu_i}, {G(\mu_i)} = {X_i^\top \theta}, and {\textrm{Var}(Y_i | X_i)} = {\phi V(\mu_i)}, where \theta = (\theta_0, \dots, \theta_{p-1}) is an unknown p-dimensional parameter, \phi is an optional dispersion parameter, G is a known smooth link function, and V is a known variance function.

With H denoting the inverse link function, define the quasi-score

{g_1(Z_i, \theta)} = \left\{ H^\prime(X_i^\top \theta) \left(Y_i - H(X_i^\top \theta)\right) / \left(\phi V\left(H(X_i^\top \theta)\right)\right) \right\} X_i.

Then we have the estimating equations \sum_{i = 1}^n g_1(Z_i, \theta) = 0. When \phi is known, the (profile) empirical likelihood ratio for a given \theta is defined by

R_1(\theta) = \max_{p_i}\left\{\prod_{i = 1}^n np_i : \sum_{i = 1}^n p_i g_1(Z_i, \theta) = 0,\ p_i \geq 0,\ \sum_{i = 1}^n p_i = 1 \right\}.

With unknown \phi, we introduce another estimating function based on the squared residuals. Let {\eta} = {(\theta, \phi)} and

{g_2(Z_i, \eta)} = \left(Y_i - H(X_i^\top \theta)\right)^2 / \left(\phi^2 V\left(H(X_i^\top \theta)\right)\right) - 1 / \phi.

Now the empirical likelihood ratio is defined by

R_2(\eta) = \max_{p_i}\left\{\prod_{i = 1}^n np_i : \sum_{i = 1}^n p_i g_1(Z_i, \eta) = 0,\ \sum_{i = 1}^n p_i g_2(Z_i, \eta) = 0,\ p_i \geq 0,\ \sum_{i = 1}^n p_i = 1 \right\}.

el_glm() first computes the parameter estimates by calling glm.fit() (with ... if any) with the model.frame and model.matrix obtained from the formula. Note that the maximum empirical likelihood estimator is the same as the the quasi-maximum likelihood estimator in our model. Next, it tests hypotheses based on asymptotic chi-square distributions of the empirical likelihood ratio statistics. Included in the tests are overall test with

H_0: \theta_1 = \theta_2 = \cdots = \theta_{p-1} = 0,

and significance tests for each parameter with

H_{0j}: \theta_j = 0,\ j = 0, \dots, p-1.

The available families and link functions are as follows:

gaussian: "identity", "log", and "inverse".
binomial: "logit", "probit", and "log".
poisson: "log", "identity", and "sqrt".
quasipoisson: "log", "identity", and "sqrt".

Value

An object of class of GLM.

References

Chen SX, Cui H (2003). “An Extended Empirical Likelihood for Generalized Linear Models.” Statistica Sinica, 13(1), 69–81.

Kolaczyk ED (1994). “Empirical Likelihood for Generalized Linear Models.” Statistica Sinica, 4(1), 199–218.

Examples

data("warpbreaks")
fit <- el_glm(wool ~ .,
  family = binomial, data = warpbreaks, weights = NULL, na.action = na.omit,
  start = NULL, etastart = NULL, mustart = NULL, offset = NULL
)
summary(fit)

[Package melt version 1.11.4 Index]