is_separated {spaMM} | R Documentation |
Checking for (quasi-)separation in binomial-response model.
Description
Separation occurs in binomial response models when a combination of the predictor variables perfectly predict a level of the response. In such a case the estimates of the coefficients for these variables diverge to (+/-)infinity, and the numerical algorithms typically fail. To anticipate such a problem, the fitting functions in spaMM
try to check for separation by default. The check may take much time, and is skipped if the “problem size” exceeds a threshold defined by spaMM.options(separation_max=<.>)
, in which case a message will tell users by how much they should increase separation_max
to force the check (its exact meaning and default value are subject to changes without notice but the default value aims to correspond to a separation check time of the order of 1s on the author's computer).
is_separated
is a convenient interface to procedures from the ROI
package, allowing them to be called explicitly by the user to check bootstrap samples (see Example in anova
).
is_separated.formula
is a variant (not yet a formal S3 method) that performs the same check, but using arguments similar to those of fitme(., family=binomial())
.
Usage
is_separated(x, y, verbose = TRUE, solver=spaMM.getOption("sep_solver"))
is_separated.formula(formula, ..., separation_max=spaMM.getOption("separation_max"),
solver=spaMM.getOption("sep_solver"))
Arguments
x |
Design matrix for fixed effects. |
y |
Numeric response vector |
formula |
A model formula |
... |
|
separation_max |
numeric: non-default value allow for easier local control of this spaMM option. |
solver |
character: name of linear programming solver used to assess separation; passed to |
verbose |
Whether to print some messages (e.g., pointing model terms that cause separation) or not. |
Value
Returns a boolean; TRUE
means there is (quasi-)separation. Screen output may give further information, such as pointing model terms that cause separation.
References
The method accessible by solver="glpk"
implements algorithms described by
Konis, K. 2007. Linear Programming Algorithms for Detecting Separated Data in Binary Logistic Regression Models. DPhil Thesis, Univ. Oxford. https://ora.ox.ac.uk/objects/uuid:8f9ee0d0-d78e-4101-9ab4-f9cbceed2a2a.
See Also
See also the 'safeBinaryRegression' and 'detectseparation' package.
Examples
set.seed(123)
d <- data.frame(success = rbinom(10, size = 1, prob = 0.9), x = 1:10)
is_separated.formula(formula= success~x, data=d) # FALSE
is_separated.formula(formula= success~I(success^2), data=d) # TRUE