estimate_expectation {modelbased} | R Documentation |
Model-based response estimates and uncertainty
Description
After fitting a model, it is useful generate model-based estimates of the response variables for different combinations of predictor values. Such estimates can be used to make inferences about relationships between variables and to make predictions about individual cases.
Model-based response estimates and uncertainty can be generated for both the conditional average response values (the regression line or expectation) and for predictions about individual cases. See below for details.
Usage
estimate_expectation(
model,
data = NULL,
ci = 0.95,
keep_iterations = FALSE,
...
)
estimate_response(...)
estimate_link(model, data = "grid", ci = 0.95, keep_iterations = FALSE, ...)
estimate_prediction(
model,
data = NULL,
ci = 0.95,
keep_iterations = FALSE,
...
)
estimate_relation(
model,
data = "grid",
ci = 0.95,
keep_iterations = FALSE,
...
)
Arguments
model |
A statistical model. |
data |
A data frame with model's predictors to estimate the response. If
|
ci |
Confidence Interval (CI) level. Default to |
keep_iterations |
If |
... |
You can add all the additional control arguments from
|
Value
A data frame of predicted values and uncertainty intervals, with
class "estimate_predicted"
. Methods for visualisation_recipe()
and plot()
are available.
Expected (average) values
The most important way that various types of response estimates differ is in terms of what quantity is being estimated and the meaning of the uncertainty intervals. The major choices are expected values for uncertainty in the regression line and predicted values for uncertainty in the individual case predictions.
Expected values refer to the fitted regression line - the estimated average response value (i.e., the "expectation") for individuals with specific predictor values. For example, in a linear model y = 2 + 3x + 4z + e, the estimated average y for individuals with x = 1 and z = 2 is 11.
For expected values, uncertainty intervals refer to uncertainty in the estimated conditional average (where might the true regression line actually fall)? Uncertainty intervals for expected values are also called "confidence intervals".
Expected values and their uncertainty intervals are useful for describing the relationship between variables and for describing how precisely a model has been estimated.
For generalized linear models, expected values are reported on one of two scales:
The link scale refers to scale of the fitted regression line, after transformation by the link function. For example, for a logistic regression (logit binomial) model, the link scale gives expected log-odds. For a log-link Poisson model, the link scale gives the expected log-count.
The response scale refers to the original scale of the response variable (i.e., without any link function transformation). Expected values on the link scale are back-transformed to the original response variable metric (e.g., expected probabilities for binomial models, expected counts for Poisson models).
Individual case predictions
In contrast to expected values, predicted values refer to predictions for individual cases. Predicted values are also called "posterior predictions" or "posterior predictive draws".
For predicted values, uncertainty intervals refer to uncertainty in the individual response values for each case (where might any single case actually fall)? Uncertainty intervals for predicted values are also called "prediction intervals" or "posterior predictive intervals".
Predicted values and their uncertainty intervals are useful for forecasting the range of values that might be observed in new data, for making decisions about individual cases, and for checking if model predictions are reasonable ("posterior predictive checks").
Predicted values and intervals are always on the scale of the original response variable (not the link scale).
Functions for estimating predicted values and uncertainty
modelbased provides 4 functions for generating model-based response estimates and their uncertainty:
-
estimate_expectation()
:Generates expected values (conditional average) on the response scale.
The uncertainty interval is a confidence interval.
By default, values are computed using the data used to fit the model.
-
estimate_link()
:Generates expected values (conditional average) on the link scale.
The uncertainty interval is a confidence interval.
By default, values are computed using a reference grid spanning the observed range of predictor values (see
visualisation_matrix()
).
-
estimate_prediction()
:Generates predicted values (for individual cases) on the response scale.
The uncertainty interval is a prediction interval.
By default, values are computed using the data used to fit the model.
-
estimate_relation()
:Like
estimate_expectation()
.Useful for visualizing a model.
Generates expected values (conditional average) on the response scale.
The uncertainty interval is a confidence interval.
By default, values are computed using a reference grid spanning the observed range of predictor values (see
visualisation_matrix()
).
estimate_response()
is a deprecated alias for estimate_expectation()
.
Data for predictions
If the data = NULL
, values are estimated using the data used to fit the
model. If data = "grid"
, values are computed using a reference grid
spanning the observed range of predictor values with
visualisation_matrix()
. This can be useful for model visualization. The
number of predictor values used for each variable can be controlled with the
length
argument. data
can also be a data frame containing columns with
names matching the model frame (see insight::get_data()
). This can be used
to generate model predictions for specific combinations of predictor values.
Note
These functions are built on top of insight::get_predicted()
and correspond
to different specifications of its parameters. It may be useful to read its
documentation,
in particular the description of the predict
argument for additional
details on the difference between expected vs. predicted values and link vs.
response scales.
Additional control parameters can be used to control results from
insight::get_datagrid()
(when data = "grid"
) and from
insight::get_predicted()
(the function used internally to compute
predictions).
For plotting, check the examples in visualisation_recipe()
. Also check out
the Vignettes and README examples for
various examples, tutorials and usecases.
Examples
library(modelbased)
# Linear Models
model <- lm(mpg ~ wt, data = mtcars)
# Get predicted and prediction interval (see insight::get_predicted)
estimate_response(model)
# Get expected values with confidence interval
pred <- estimate_relation(model)
pred
# Visualisation (see visualisation_recipe())
if (require("see")) {
plot(pred)
}
# Standardize predictions
pred <- estimate_relation(lm(mpg ~ wt + am, data = mtcars))
z <- standardize(pred, include_response = FALSE)
z
unstandardize(z, include_response = FALSE)
# Logistic Models
model <- glm(vs ~ wt, data = mtcars, family = "binomial")
estimate_response(model)
estimate_relation(model)
# Mixed models
if (require("lme4")) {
model <- lmer(mpg ~ wt + (1 | gear), data = mtcars)
estimate_response(model)
estimate_relation(model)
}
# Bayesian models
if (require("rstanarm")) {
model <- suppressWarnings(rstanarm::stan_glm(
mpg ~ wt,
data = mtcars, refresh = 0, iter = 200
))
estimate_response(model)
estimate_relation(model)
}