| gather_draws {tidybayes} | R Documentation |
Extract draws of variables in a Bayesian model fit into a tidy data format
Description
Extract draws from a Bayesian model for one or more variables (possibly with named dimensions) into one of two types of long-format data frames.
Usage
gather_draws(
model,
...,
regex = FALSE,
sep = "[, ]",
ndraws = NULL,
seed = NULL,
n
)
spread_draws(
model,
...,
regex = FALSE,
sep = "[, ]",
ndraws = NULL,
seed = NULL,
n
)
Arguments
model |
A supported Bayesian model fit. Tidybayes supports a variety of model objects; for a full list of supported models, see tidybayes-models. |
... |
Expressions in the form of
|
regex |
If |
sep |
Separator used to separate dimensions in variable names, as a regular expression. |
ndraws |
The number of draws to return, or |
seed |
A seed to use when subsampling draws (i.e. when |
n |
(Deprecated). Use |
Details
Imagine a JAGS or Stan fit named model. The model may contain a variable named
b[i,v] (in the JAGS or Stan language) with dimension i in 1:100 and
dimension v in 1:3. However, the default format for draws returned from
JAGS or Stan in R will not reflect this indexing structure, instead
they will have multiple columns with names like "b[1,1]", "b[2,1]", etc.
spread_draws and gather_draws provide a straightforward
syntax to translate these columns back into properly-indexed variables in two different
tidy data frame formats, optionally recovering dimension types (e.g. factor levels) as it does so.
spread_draws and gather_draws return data frames already grouped by
all dimensions used on the variables you specify.
The difference between spread_draws is that names of variables in the model will
be spread across the data frame as column names, whereas gather_draws will
gather variables into a single column named ".variable" and place values of variables into a
column named ".value". To use naming schemes from other packages (such as broom), consider passing
results through functions like to_broom_names() or to_ggmcmc_names().
For example, spread_draws(model, a[i], b[i,v]) might return a grouped
data frame (grouped by i and v), with:
column
".chain": the chain number.NAif not applicable to the model type; this is typically only applicable to MCMC algorithms.column
".iteration": the iteration number. Guaranteed to be unique within-chain only.NAif not applicable to the model type; this is typically only applicable to MCMC algorithms.column
".draw": a unique number for each draw from the posterior. Order is not guaranteed to be meaningful.column
"i": value in1:5column
"v": value in1:10column
"a": value of"a[i]"for draw".draw"column
"b": value of"b[i,v]"for draw".draw"
gather_draws(model, a[i], b[i,v]) on the same model would return a grouped
data frame (grouped by i and v), with:
column
".chain": the chain numbercolumn
".iteration": the iteration numbercolumn
".draw": the draw numbercolumn
"i": value in1:5column
"v": value in1:10, orNAif".variable"is"a".column
".variable": value inc("a", "b").column
".value": value of"a[i]"(when".variable"is"a") or"b[i,v]"(when".variable"is"b") for draw".draw"
spread_draws and gather_draws can use type information
applied to the model object by recover_types() to convert columns
back into their original types. This is particularly helpful if some of the dimensions in
your model were originally factors. For example, if the v dimension
in the original data frame data was a factor with levels c("a","b","c"),
then we could use recover_types before spread_draws:
model %>% recover_types(data) spread_draws(model, b[i,v])
Which would return the same data frame as above, except the "v" column
would be a value in c("a","b","c") instead of 1:3.
For variables that do not share the same subscripts (or share
some but not all subscripts), we can supply their specifications separately.
For example, if we have a variable d[i] with the same i subscript
as b[i,v], and a variable x with no subscripts, we could do this:
spread_draws(model, x, d[i], b[i,v])
Which is roughly equivalent to this:
spread_draws(model, x) %>% inner_join(spread_draws(model, d[i])) %>% inner_join(spread_draws(model, b[i,v])) %>% group_by(i,v)
Similarly, this:
gather_draws(model, x, d[i], b[i,v])
Is roughly equivalent to this:
bind_rows( gather_draws(model, x), gather_draws(model, d[i]), gather_draws(model, b[i,v]) )
The c and cbind functions can be used to combine multiple variable names that have
the same dimensions. For example, if we have several variables with the same
subscripts i and v, we could do either of these:
spread_draws(model, c(w, x, y, z)[i,v])
spread_draws(model, cbind(w, x, y, z)[i,v]) # equivalent
Each of which is roughly equivalent to this:
spread_draws(model, w[i,v], x[i,v], y[i,v], z[i,v])
Besides being more compact, the c()-style syntax is currently also
faster (though that may change).
Dimensions can be omitted from the resulting data frame by leaving their names
blank; e.g. spread_draws(model, b[,v]) will omit the first dimension of
b from the output. This is useful if a dimension is known to contain all
the same value in a given model.
The shorthand .. can be used to specify one column that should be put
into a wide format and whose names will be the base variable name, plus a dot
("."), plus the value of the dimension at ... For example:
spread_draws(model, b[i,..]) would return a grouped data frame
(grouped by i), with:
column
".chain": the chain numbercolumn
".iteration": the iteration numbercolumn
".draw": the draw numbercolumn
"i": value in1:20column
"b.1": value of"b[i,1]"for draw".draw"column
"b.2": value of"b[i,2]"for draw".draw"column
"b.3": value of"b[i,3]"for draw".draw"
An optional clause in the form | wide_dimension can also be used to put
the data frame into a wide format based on wide_dimension. For example, this:
spread_draws(model, b[i,v] | v)
is roughly equivalent to this:
spread_draws(model, b[i,v]) %>% spread(v,b)
The main difference between using the | syntax instead of the
.. syntax is that the | syntax respects prototypes applied to
dimensions with recover_types(), and thus can be used to get
columns with nicer names. For example:
model %>% recover_types(data) %>% spread_draws(b[i,v] | v)
would return a grouped data frame
(grouped by i), with:
column
".chain": the chain numbercolumn
".iteration": the iteration numbercolumn
".draw": the draw numbercolumn
"i": value in1:20column
"a": value of"b[i,1]"for draw".draw"column
"b": value of"b[i,2]"for draw".draw"column
"c": value of"b[i,3]"for draw".draw"
The shorthand . can be used to specify columns that should be nested
into vectors, matrices, or n-dimensional arrays (depending on how many dimensions
are specified with .).
For example, spread_draws(model, a[.], b[.,.]) might return a
data frame, with:
column
".chain": the chain number.column
".iteration": the iteration number.column
".draw": a unique number for each draw from the posterior.column
"a": a list column of vectors.column
"b": a list column of matrices.
Ragged arrays are turned into non-ragged arrays with
missing entries given the value NA.
Finally, variable names can be regular expressions by setting regex = TRUE; e.g.:
spread_draws(model, `b_.*`[i], regex = TRUE)
Would return a tidy data frame with variables starting with b_ and having one dimension.
Value
A data frame.
Author(s)
Matthew Kay
See Also
spread_rvars(), recover_types(), compose_data().
Examples
library(dplyr)
library(ggplot2)
data(RankCorr, package = "ggdist")
RankCorr %>%
spread_draws(b[i, j])
RankCorr %>%
spread_draws(b[i, j], tau[i], u_tau[i])
RankCorr %>%
gather_draws(b[i, j], tau[i], u_tau[i])
RankCorr %>%
gather_draws(tau[i], typical_r) %>%
median_qi()