binned_residuals {regressinator} | R Documentation |
Obtain binned residuals for a model
Description
Construct a data frame by binning the fitted values or predictors of a model into discrete bins of equal width, and calculating the average value of the residuals within each bin.
Usage
binned_residuals(fit, predictors = !".fitted", breaks = NULL, ...)
Arguments
fit |
The model to obtain residuals for. This can be a model fit with
|
predictors |
Predictors to calculate binned residuals for. Defaults to
all predictors, skipping factors. Predictors can be specified using
tidyselect syntax; see |
breaks |
Number of bins to create. If |
... |
Additional arguments passed on to |
Details
In many generalized linear models, the residual plots (Pearson or deviance) are not useful because the response variable takes on very few possible values, causing strange patterns in the residuals. For instance, in logistic regression, plotting the residuals versus covariates usually produces two curved lines.
If we first bin the data, i.e. divide up the observations into breaks
bins
based on their fitted values, we can calculate the average residual within
each bin. This can be more informative: if a region has 20 observations and
its average residual value is large, this suggests those observations are
collectively poorly fit. We can also bin each predictor and calculate
averages within those bins, allowing the detection of misspecification for
specific model terms.
Value
Data frame (tibble) with one row per bin per selected predictor, and the following columns:
.bin |
Bin number. |
n |
Number of observations in this bin. |
predictor_name |
Name of the predictor that has been binned. |
predictor_min , predictor_max , predictor_mean , predictor_sd |
Minimum, maximum, mean, and standard deviation of the predictor (or fitted values). |
resid_mean |
Mean residual in this bin. |
resid_sd |
Standard deviation of residuals in this bin. |
Limitations
Factor predictors (as factors, logical, or character vectors) are detected
automatically and omitted. However, if a numeric variable is converted to
factor in the model formula, such as with y ~ factor(x)
, the function
cannot determine the appropriate type and will raise an error. Create factors
as needed in the source data frame before fitting the model to avoid this
issue.
References
Gelman, A., Hill, J., and Vehtari, A. (2021). Regression and Other Stories. Section 14.5. Cambridge University Press.
See Also
partial_residuals()
for the related partial residuals;
vignette("logistic-regression-diagnostics")
and
vignette("other-glm-diagnostics")
for examples of use and interpretation
of binned residuals in logistic regression and GLMs; bin_by_interval()
and bin_by_quantile()
to bin data and calculate other values in each bin
Examples
fit <- lm(mpg ~ disp + hp, data = mtcars)
# Automatically bins both predictors:
binned_residuals(fit, breaks = 5)
# Just bin one predictor, selected with tidyselect syntax. Multiple could be
# selected with c().
binned_residuals(fit, disp, breaks = 5)
# Bin the fitted values:
binned_residuals(fit, predictors = .fitted)
# Bins are made using the predictor, not regressors derived from it, so here
# disp is binned, not its polynomial
fit2 <- lm(mpg ~ poly(disp, 2), data = mtcars)
binned_residuals(fit2)