R: Obtain binned residuals for a model

binned_residuals {regressinator}

R Documentation

Obtain binned residuals for a model

Description

Construct a data frame by binning the fitted values or predictors of a model into discrete bins of equal width, and calculating the average value of the residuals within each bin.

Usage

binned_residuals(fit, predictors = !".fitted", breaks = NULL, ...)

Arguments

`fit`	The model to obtain residuals for. This can be a model fit with `lm()` or `glm()`, or any model that has `residuals()` and `fitted()` methods.
`predictors`	Predictors to calculate binned residuals for. Defaults to all predictors, skipping factors. Predictors can be specified using tidyselect syntax; see `help("language", package = "tidyselect")` and the examples below. Specify `predictors = .fitted` to obtain binned residuals versus fitted values.
`breaks`	Number of bins to create. If `NULL`, a default number of breaks is chosen based on the number of rows in the data.
`...`	Additional arguments passed on to `residuals()`. The most useful additional argument is typically `type`, to select the type of residuals to produce (such as standardized residuals or deviance residuals).

Details

In many generalized linear models, the residual plots (Pearson or deviance) are not useful because the response variable takes on very few possible values, causing strange patterns in the residuals. For instance, in logistic regression, plotting the residuals versus covariates usually produces two curved lines.

If we first bin the data, i.e. divide up the observations into breaks bins based on their fitted values, we can calculate the average residual within each bin. This can be more informative: if a region has 20 observations and its average residual value is large, this suggests those observations are collectively poorly fit. We can also bin each predictor and calculate averages within those bins, allowing the detection of misspecification for specific model terms.

Value

Data frame (tibble) with one row per bin per selected predictor, and the following columns:

`.bin`	Bin number.
`n`	Number of observations in this bin.
`predictor_name`	Name of the predictor that has been binned.
`predictor_min`, `predictor_max`, `predictor_mean`, `predictor_sd`	Minimum, maximum, mean, and standard deviation of the predictor (or fitted values).
`resid_mean`	Mean residual in this bin.
`resid_sd`	Standard deviation of residuals in this bin.

Limitations

Factor predictors (as factors, logical, or character vectors) are detected automatically and omitted. However, if a numeric variable is converted to factor in the model formula, such as with y ~ factor(x), the function cannot determine the appropriate type and will raise an error. Create factors as needed in the source data frame before fitting the model to avoid this issue.

References

Gelman, A., Hill, J., and Vehtari, A. (2021). Regression and Other Stories. Section 14.5. Cambridge University Press.

Examples

fit <- lm(mpg ~ disp + hp, data = mtcars)

# Automatically bins both predictors:
binned_residuals(fit, breaks = 5)

# Just bin one predictor, selected with tidyselect syntax. Multiple could be
# selected with c().
binned_residuals(fit, disp, breaks = 5)

# Bin the fitted values:
binned_residuals(fit, predictors = .fitted)

# Bins are made using the predictor, not regressors derived from it, so here
# disp is binned, not its polynomial
fit2 <- lm(mpg ~ poly(disp, 2), data = mtcars)
binned_residuals(fit2)

[Package regressinator version 0.1.3 Index]