detect_separation {detectseparation} | R Documentation |
glm
that tests for data separation and
finds which parameters have infinite maximum likelihood estimates
in generalized linear models with binomial responsesdetect_separation
is a method for glm
that tests for the occurrence of complete or quasi-complete
separation in datasets for binomial response generalized linear
models, and finds which of the parameters will have infinite
maximum likelihood estimates. detect_separation
relies on the linear programming methods developed in Konis (2007).
detect_separation(
x,
y,
weights = rep(1, nobs),
start = NULL,
etastart = NULL,
mustart = NULL,
offset = rep(0, nobs),
family = gaussian(),
control = list(),
intercept = TRUE,
singular.ok = TRUE
)
detectSeparation(
x,
y,
weights = rep(1, nobs),
start = NULL,
etastart = NULL,
mustart = NULL,
offset = rep(0, nobs),
family = gaussian(),
control = list(),
intercept = TRUE,
singular.ok = TRUE
)
x |
|
y |
|
weights |
an optional vector of ‘prior weights’ to be used
in the fitting process. Should be |
start |
currently not used. |
etastart |
currently not used. |
mustart |
currently not used. |
offset |
this can be used to specify an a priori known
component to be included in the linear predictor during fitting.
This should be |
family |
a description of the error distribution and link
function to be used in the model. For |
control |
a list of parameters controlling separation
detection. See |
intercept |
logical. Should an intercept be included in the null model? |
singular.ok |
logical. If |
detect_separation
is a wrapper to the
separator_ROI
function and separator_lpSolveAPI
function (a modified version of the separator
function from
the **safeBinaryRegression** R
package). detect_separation
can be passed directly as
a method to the glm
function. See, examples.
The coefficients
method extracts a vector of values
for each of the model parameters under the following convention:
0
if the maximum likelihood estimate of the parameter is
finite, and Inf
or -Inf
if the maximum likelihood
estimate of the parameter if plus or minus infinity. This
convention makes it easy to adjust the maximum likelihood estimates
to their actual values by element-wise addition.
detectSeparation
is an alias for detect_separation
.
A list that inherits from class detect_separation
,
glm
and lm
. A print
method is provided for
detect_separation
objects.
For the definition of complete and quasi-complete separation, see Albert and Anderson (1984). Kosmidis and Firth (2021) prove that the reduced-bias estimator that results by the penalization of the logistic regression log-likelihood by Jeffreys prior takes always finite values, even when some of the maximum likelihood estimates are infinite. The reduced-bias estimates can be computed using the brglm2 R package.
detect_separation
was designed in 2017 by Ioannis
Kosmidis for the **brglm2** R package, after correspondence with
Kjell Konis, and a port of the separator
function had been
included in **brglm2** under the permission of Kjell Konis.
In 2020, detect_separation
and
check_infinite_estimates
were moved outside
**brglm2** into the dedicated **detectseparation** package. Dirk Schumacher
authored the separator_ROI
function, which depends on the
**ROI** R package and is now the default implementation used for
detecting separation.
Ioannis Kosmidis [aut, cre] ioannis.kosmidis@warwick.ac.uk, Dirk Schumacher [aut] mail@dirk-schumacher.net, Kjell Konis [ctb] kjell.konis@me.com
Konis K. (2007). *Linear Programming Algorithms for Detecting Separated Data in Binary Logistic Regression Models*. DPhil. University of Oxford. https://ora.ox.ac.uk/objects/uuid:8f9ee0d0-d78e-4101-9ab4-f9cbceed2a2a
Konis K. (2013). safeBinaryRegression: Safe Binary Regression. R package version 0.1-3. https://CRAN.R-project.org/package=safeBinaryRegression
Kosmidis I. and Firth D. (2021). Jeffreys-prior penalty, finiteness and shrinkage in binomial-response generalized linear models. *Biometrika*, **108**, 71–82
glm.fit
and glm
, check_infinite_estimates
, brglm_fit
,
## endometrial data from Heinze \& Schemper (2002) (see ?endometrial)
data("endometrial", package = "detectseparation")
endometrial_sep <- glm(HG ~ NV + PI + EH, data = endometrial,
family = binomial("logit"),
method = "detect_separation")
endometrial_sep
## The maximum likelihood estimate for NV is infinite
summary(update(endometrial_sep, method = "glm.fit"))
## Example inspired by unpublished microeconometrics lecture notes by
## Achim Zeileis https://eeecon.uibk.ac.at/~zeileis/
## The maximum likelihood estimate of sourhernyes is infinite
if (requireNamespace("AER", quietly = TRUE)) {
data("MurderRates", package = "AER")
murder_sep <- glm(I(executions > 0) ~ time + income +
noncauc + lfp + southern, data = MurderRates,
family = binomial(), method = "detect_separation")
murder_sep
## which is also evident by the large estimated standard error for NV
murder_glm <- update(murder_sep, method = "glm.fit")
summary(murder_glm)
## and is also reveal by the divergence of the NV column of the
## result from the more computationally intensive check
plot(check_infinite_estimates(murder_glm))
## Mean bias reduction via adjusted scores results in finite estimates
if (requireNamespace("brglm2", quietly = TRUE))
update(murder_glm, method = brglm2::brglm_fit)
}