add_epred_draws {tidybayes} | R Documentation |
Add draws from the posterior fit, predictions, or residuals of a model to a data frame
Description
Given a data frame and a model, adds draws from the linear/link-level predictor, the expectation of the posterior predictive, the posterior predictive, or the residuals of a model to the data frame in a long format.
Usage
add_epred_draws(
newdata,
object,
...,
value = ".epred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
dpar = NULL
)
epred_draws(
object,
newdata,
...,
value = ".epred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
dpar = NULL
)
## Default S3 method:
epred_draws(
object,
newdata,
...,
value = ".epred",
seed = NULL,
category = NULL
)
## S3 method for class 'stanreg'
epred_draws(
object,
newdata,
...,
value = ".epred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
dpar = NULL
)
## S3 method for class 'brmsfit'
epred_draws(
object,
newdata,
...,
value = ".epred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
dpar = NULL
)
add_linpred_draws(
newdata,
object,
...,
value = ".linpred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
dpar = NULL,
n
)
linpred_draws(
object,
newdata,
...,
value = ".linpred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
dpar = NULL,
n,
scale
)
## Default S3 method:
linpred_draws(
object,
newdata,
...,
value = ".linpred",
seed = NULL,
category = NULL
)
## S3 method for class 'stanreg'
linpred_draws(
object,
newdata,
...,
value = ".linpred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
dpar = NULL
)
## S3 method for class 'brmsfit'
linpred_draws(
object,
newdata,
...,
value = ".linpred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
dpar = NULL
)
add_predicted_draws(
newdata,
object,
...,
value = ".prediction",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
n
)
predicted_draws(
object,
newdata,
...,
value = ".prediction",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
n,
prediction
)
## Default S3 method:
predicted_draws(
object,
newdata,
...,
value = ".prediction",
seed = NULL,
category = ".category"
)
## S3 method for class 'stanreg'
predicted_draws(
object,
newdata,
...,
value = ".prediction",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category"
)
## S3 method for class 'brmsfit'
predicted_draws(
object,
newdata,
...,
value = ".prediction",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category"
)
add_residual_draws(
newdata,
object,
...,
value = ".residual",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
n
)
residual_draws(
object,
newdata,
...,
value = ".residual",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
n,
residual
)
## Default S3 method:
residual_draws(object, newdata, ...)
## S3 method for class 'brmsfit'
residual_draws(
object,
newdata,
...,
value = ".residual",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category"
)
Arguments
newdata |
Data frame to generate predictions from. |
object |
A supported Bayesian model fit that can provide fits and predictions. Supported models
are listed in the second section of tidybayes-models: Models Supporting Prediction. While other
functions in this package (like |
... |
Additional arguments passed to the underlying prediction method for the type of model given. |
value |
The name of the output column:
|
ndraws |
The number of draws to return, or |
seed |
A seed to use when subsampling draws (i.e. when |
re_formula |
formula containing group-level effects to be considered in the prediction.
If |
category |
For some ordinal, multinomial, and multivariate models (notably, |
dpar |
For |
n |
(Deprecated). Use |
scale |
(Deprecated). Use the appropriate function ( |
prediction , residual |
(Deprecated). Use |
Details
Consider a model like:
\begin{array}{rcl}
y &\sim& \textrm{SomeDist}(\theta_1, \theta_2)\\
f_1(\theta_1) &=& \alpha_1 + \beta_1 x\\
f_2(\theta_2) &=& \alpha_2 + \beta_2 x
\end{array}
This model has:
an outcome variable,
y
a response distribution,
\textrm{SomeDist}
, having parameters\theta_1
(with link functionf_1
) and\theta_2
(with link functionf_2
)a single predictor,
x
coefficients
\alpha_1
,\beta_1
,\alpha_2
, and\beta_2
We fit this model to some observed data, y_\textrm{obs}
, and predictors,
x_\textrm{obs}
. Given new values of predictors, x_\textrm{new}
,
supplied in the data frame newdata
, the functions for posterior draws are
defined as follows:
-
add_predicted_draws()
adds draws from the posterior predictive distribution,p(y_\textrm{new} | x_\textrm{new}, y_\textrm{obs})
, to the data. It corresponds torstanarm::posterior_predict()
orbrms::posterior_predict()
. -
add_epred_draws()
adds draws from the expectation of the posterior predictive distribution, aka the conditional expectation,E(y_\textrm{new} | x_\textrm{new}, y_\textrm{obs})
, to the data. It corresponds torstanarm::posterior_epred()
orbrms::posterior_epred()
. Not all models support this function. -
add_linpred_draws()
adds draws from the posterior linear predictors to the data. It corresponds torstanarm::posterior_linpred()
orbrms::posterior_linpred()
. Depending on the model type and additional parameters passed, this may be:The untransformed linear predictor, e.g.
p(f_1(\theta_1) | x_\textrm{new}, y_\textrm{obs})
=p(\alpha_1 + \beta_1 x_\textrm{new} | x_\textrm{new}, y_\textrm{obs})
. This is returned byadd_linpred_draws(transform = FALSE)
for brms and rstanarm models. It is analogous totype = "link"
inpredict.glm()
.The inverse-link transformed linear predictor, e.g.
p(\theta_1 | x_\textrm{new}, y_\textrm{obs})
=p(f_1^{-1}(\alpha_1 + \beta_1 x_\textrm{new}) | x_\textrm{new}, y_\textrm{obs})
. This is returned byadd_linpred_draws(transform = TRUE)
for brms and rstanarm models. It is analogous totype = "response"
inpredict.glm()
.
NOTE:
add_linpred_draws(transform = TRUE)
andadd_epred_draws()
may be equivalent but are not guaranteed to be. They are equivalent when the expectation of the response distribution is equal to its first parameter, i.e. whenE(y) = \theta_1
. Many distributions have this property (e.g. Normal distributions, Bernoulli distributions), but not all. If you want the expectation of the posterior predictive, it is best to useadd_epred_draws()
if available, and if not available, verify this property holds prior to usingadd_linpred_draws()
. -
add_residual_draws()
adds draws from residuals,p(y_\textrm{obs} - y_\textrm{new} | x_\textrm{new}, y_\textrm{obs})
, to the data. It corresponds tobrms::residuals.brmsfit()
.
The corresponding functions without add_
as a prefix are alternate spellings
with the opposite order of the first two arguments: e.g. add_predicted_draws(newdata, object)
versus predicted_draws(object, newdata)
. This facilitates use in data
processing pipelines that start either with a data frame or a model.
Given equal choice between the two, the spellings prefixed with add_
are preferred.
Value
A data frame (actually, a tibble) with a .row
column (a
factor grouping rows from the input newdata
), .chain
column (the chain
each draw came from, or NA
if the model does not provide chain information),
.iteration
column (the iteration the draw came from, or NA
if the model does
not provide iteration information), and a .draw
column (a unique index corresponding to each draw
from the distribution). In addition, epred_draws
includes a column with its name specified by
the epred
argument (default ".epred"
); linpred_draws
includes a column with its name
specified by the linpred
argument (default ".linpred"
), and
predicted_draws
contains a column with its name specified by the .prediction
argument (default
".prediction"
). For convenience, the resulting data frame comes grouped by the original input rows.
Author(s)
Matthew Kay
See Also
add_draws()
for the variant of these functions for use with packages that do not have
explicit support for these functions yet. See spread_draws()
for manipulating posteriors directly.
Examples
## Not run:
library(ggplot2)
library(dplyr)
library(brms)
library(modelr)
theme_set(theme_light())
m_mpg = brm(mpg ~ hp * cyl, data = mtcars,
# 1 chain / few iterations just so example runs quickly
# do not use in practice
chains = 1, iter = 500)
# draw 100 lines from the posterior means and overplot them
mtcars %>%
group_by(cyl) %>%
data_grid(hp = seq_range(hp, n = 101)) %>%
# NOTE: only use ndraws here when making spaghetti plots; for
# plotting intervals it is always best to use all draws (omit ndraws)
add_epred_draws(m_mpg, ndraws = 100) %>%
ggplot(aes(x = hp, y = mpg, color = ordered(cyl))) +
geom_line(aes(y = .epred, group = paste(cyl, .draw)), alpha = 0.25) +
geom_point(data = mtcars)
# plot posterior predictive intervals
mtcars %>%
group_by(cyl) %>%
data_grid(hp = seq_range(hp, n = 101)) %>%
add_predicted_draws(m_mpg) %>%
ggplot(aes(x = hp, y = mpg, color = ordered(cyl))) +
stat_lineribbon(aes(y = .prediction), .width = c(.99, .95, .8, .5), alpha = 0.25) +
geom_point(data = mtcars) +
scale_fill_brewer(palette = "Greys")
## End(Not run)