binned_residuals {regressinator}R Documentation

Obtain binned residuals for a model

Description

Construct a data frame by binning the fitted values or predictors of a model into discrete bins of equal width, and calculating the average value of the residuals within each bin.

Usage

binned_residuals(fit, predictors = !".fitted", breaks = NULL, ...)

Arguments

fit

The model to obtain residuals for. This can be a model fit with lm() or glm(), or any model that has residuals() and fitted() methods.

predictors

Predictors to calculate binned residuals for. Defaults to all predictors, skipping factors. Predictors can be specified using tidyselect syntax; see help("language", package = "tidyselect") and the examples below. Specify predictors = .fitted to obtain binned residuals versus fitted values.

breaks

Number of bins to create. If NULL, a default number of breaks is chosen based on the number of rows in the data.

...

Additional arguments passed on to residuals(). The most useful additional argument is typically type, to select the type of residuals to produce (such as standardized residuals or deviance residuals).

Details

In many generalized linear models, the residual plots (Pearson or deviance) are not useful because the response variable takes on very few possible values, causing strange patterns in the residuals. For instance, in logistic regression, plotting the residuals versus covariates usually produces two curved lines.

If we first bin the data, i.e. divide up the observations into breaks bins based on their fitted values, we can calculate the average residual within each bin. This can be more informative: if a region has 20 observations and its average residual value is large, this suggests those observations are collectively poorly fit. We can also bin each predictor and calculate averages within those bins, allowing the detection of misspecification for specific model terms.

Value

Data frame (tibble) with one row per bin per selected predictor, and the following columns:

.bin

Bin number.

n

Number of observations in this bin.

predictor_name

Name of the predictor that has been binned.

predictor_min, predictor_max, predictor_mean, predictor_sd

Minimum, maximum, mean, and standard deviation of the predictor (or fitted values).

resid_mean

Mean residual in this bin.

resid_sd

Standard deviation of residuals in this bin.

Limitations

Factor predictors (as factors, logical, or character vectors) are detected automatically and omitted. However, if a numeric variable is converted to factor in the model formula, such as with y ~ factor(x), the function cannot determine the appropriate type and will raise an error. Create factors as needed in the source data frame before fitting the model to avoid this issue.

References

Gelman, A., Hill, J., and Vehtari, A. (2021). Regression and Other Stories. Section 14.5. Cambridge University Press.

See Also

partial_residuals() for the related partial residuals; vignette("logistic-regression-diagnostics") and vignette("other-glm-diagnostics") for examples of use and interpretation of binned residuals in logistic regression and GLMs; bin_by_interval() and bin_by_quantile() to bin data and calculate other values in each bin

Examples

fit <- lm(mpg ~ disp + hp, data = mtcars)

# Automatically bins both predictors:
binned_residuals(fit, breaks = 5)

# Just bin one predictor, selected with tidyselect syntax. Multiple could be
# selected with c().
binned_residuals(fit, disp, breaks = 5)

# Bin the fitted values:
binned_residuals(fit, predictors = .fitted)

# Bins are made using the predictor, not regressors derived from it, so here
# disp is binned, not its polynomial
fit2 <- lm(mpg ~ poly(disp, 2), data = mtcars)
binned_residuals(fit2)

[Package regressinator version 0.1.3 Index]