detect_separation {detectseparation}  R Documentation 
Method for glm
that tests for data separation and
finds which parameters have infinite maximum likelihood estimates
in generalized linear models with binomial responses
detect_separation()
is a method for glm
that tests for the occurrence of complete or quasicomplete
separation in datasets for binomial response generalized linear
models, and finds which of the parameters will have infinite
maximum likelihood estimates. detect_separation()
relies on the linear programming methods developed in Konis (2007).
detect_separation(
x,
y,
weights = NULL,
start = NULL,
etastart = NULL,
mustart = NULL,
offset = NULL,
family = gaussian(),
control = list(),
intercept = TRUE,
singular.ok = TRUE
)
detectSeparation(
x,
y,
weights = NULL,
start = NULL,
etastart = NULL,
mustart = NULL,
offset = NULL,
family = gaussian(),
control = list(),
intercept = TRUE,
singular.ok = TRUE
)
x 

y 

weights 
an optional vector of ‘prior weights’ to be used
in the fitting process. Should be 
start 
currently not used. 
etastart 
currently not used. 
mustart 
currently not used. 
offset 
this can be used to specify an a priori known
component to be included in the linear predictor during fitting.
This should be 
family 
a description of the error distribution and link
function to be used in the model. For 
control 
a list of parameters controlling separation
detection. See 
intercept 
logical. Should an intercept be included in the null model? 
singular.ok 
logical. If 
Following the definitions in Albert and Anderson (1984), the data
for a binomialresponse generalized linear model with logistic link
exhibit quasicomplete separation if there exists a nonzero
parameter vector \beta
such that X^0 \beta \le 0
and
X^1 \beta \ge 0
, where X^0
and X^1
are the
matrices formed by the rows of the model matrix $X$ corresponding
to zero and nonzero responses, respectively. The data exhibits
complete separation if there exists a parameter vector \beta
such
that the aforementioned conditions are satisfied with strict
inequalities. If there are no vectors \beta
that can satisfy the
conditions, then the data points are said to overlap.
If the inverse link function G(t)
of a generalized linear
model with binomial responses is such that \log G(t)
and
\log (1  G(t))
are concave and the model has an intercept
parameter, then overlap is a necessary and sufficient condition for
the maximum likelihood estimates to be finite (see Silvapulle, 1981
for a proof). Such link functions are, for example, the logit,
probit and complementary loglog.
detect_separation()
determines whether or not the
data exhibits (quasi)complete separation. Then, if separation is
detected and the link function G(t)
is such that \log
G(t)
and \log (1  G(t))
are concave, the maximum likelihood
estimates has infinite components.
detect_separation()
is a wrapper to the
detect_infinite_estimates()
method. Separation
detection, as separation is defined above, takes place using the
linear programming methods in Konis (2007) regardless of the link
function. The output of those methods is also used to determine
which estimates are infinite, unless the link is "log". In the
latter case the linear programming methods in Schwendinger et
al. (2021) are called to establish if and which estimates are
infinite. If the link function is not one of '"logit"', '"log"',
'"probit"', '"cauchit"', '"cloglog"' then a warning is issued.
The coefficients
method extracts a vector of values
for each of the model parameters under the following convention:
0
if the maximum likelihood estimate of the parameter is
finite, and Inf
or Inf
if the maximum likelihood
estimate of the parameter if plus or minus infinity. This
convention makes it easy to adjust the maximum likelihood estimates
to their actual values by elementwise addition.
detect_separation()
can be passed directly as
a method to the glm
function. See, examples.
detectSeparation
() is an alias for detect_separation
().
A list that inherits from class detect_separation
,
glm
and lm
. A print
method is provided for
detect_separation
objects.
For the definition of complete and quasicomplete separation, see Albert and Anderson (1984). Kosmidis and Firth (2021) prove that the reducedbias estimator that results by the penalization of the logistic regression loglikelihood by Jeffreys prior takes always finite values, even when some of the maximum likelihood estimates are infinite. The reducedbias estimates can be computed using the brglm2 R package.
detect_separation
was designed in 2017 by Ioannis
Kosmidis for the **brglm2** R package, after correspondence with
Kjell Konis, and a port of the separator
function had been
included in **brglm2** under the permission of Kjell Konis. In
2020, detect_separation
and
check_infinite_estimates
were moved outside
**brglm2** into the dedicated **detectseparation** package. Dirk
Schumacher authored the separator_ROI
function, which
depends on the **ROI** R package and is now the default
implementation used for detecting separation. In 2022, Florian
Schwendinger authored the dielb_ROI
function for detecting
infinite estimates in logbinomial regression, and, with Ioannis
Kosmidis, they refactored the codebase to properly accommodate for
the support of logbinomial regression.
Ioannis Kosmidis [aut, cre] ioannis.kosmidis@warwick.ac.uk, Dirk Schumacher [aut] mail@dirkschumacher.net, Florian Schwendinger [aut] FlorianSchwendinger@gmx.at, Kjell Konis [ctb] kjell.konis@me.com
Konis K. (2007). *Linear Programming Algorithms for Detecting Separated Data in Binary Logistic Regression Models*. DPhil. University of Oxford. https://ora.ox.ac.uk/objects/uuid:8f9ee0d0d78e41019ab4f9cbceed2a2a
Konis K. (2013). safeBinaryRegression: Safe Binary Regression. R package version 0.13. https://CRAN.Rproject.org/package=safeBinaryRegression
Kosmidis I. and Firth D. (2021). Jeffreysprior penalty, finiteness and shrinkage in binomialresponse generalized linear models. *Biometrika*, **108**, 71–82. doi:10.1093/biomet/asaa052
Silvapulle, M. J. (1981). On the Existence of Maximum Likelihood Estimators for the Binomial Response Models. *Journal of the Royal Statistical Society. Series B (Methodological)*, **43**, 310–313. https://www.jstor.org/stable/2984941
Schwendinger, F., Grün, B. & Hornik, K. (2021). A comparison of optimization solvers for log binomial regression including conic programming. *Computational Statistics*, **36**, 1721–1754. doi:10.1007/s00180021010845
glm.fit
and glm
, detect_infinite_estimates
, check_infinite_estimates
, brglm_fit
# endometrial data from Heinze \& Schemper (2002) (see ?endometrial)
data("endometrial", package = "detectseparation")
endometrial_sep < glm(HG ~ NV + PI + EH, data = endometrial,
family = binomial("logit"),
method = "detect_separation")
endometrial_sep
# The maximum likelihood estimate for NV is infinite
summary(update(endometrial_sep, method = "glm.fit"))
# Example inspired by unpublished microeconometrics lecture notes by
# Achim Zeileis https://eeecon.uibk.ac.at/~zeileis/
# The maximum likelihood estimate of sourhernyes is infinite
if (requireNamespace("AER", quietly = TRUE)) {
data("MurderRates", package = "AER")
murder_sep < glm(I(executions > 0) ~ time + income +
noncauc + lfp + southern, data = MurderRates,
family = binomial(), method = "detect_separation")
murder_sep
# which is also evident by the large estimated standard error for NV
murder_glm < update(murder_sep, method = "glm.fit")
summary(murder_glm)
# and is also revealed by the divergence of the NV column of the
# result from the more computationally intensive check
plot(check_infinite_estimates(murder_glm))
# Mean bias reduction via adjusted scores results in finite estimates
if (requireNamespace("brglm2", quietly = TRUE))
update(murder_glm, method = brglm2::brglm_fit)
}