R: Run the same 'brms' model on multiple datasets

brm_multiple {brms}

R Documentation

Run the same brms model on multiple datasets

Description

Run the same brms model on multiple datasets and then combine the results into one fitted model object. This is useful in particular for multiple missing value imputation, where the same model is fitted on multiple imputed data sets. Models can be run in parallel using the future package.

Usage

brm_multiple(
  formula,
  data,
  family = gaussian(),
  prior = NULL,
  data2 = NULL,
  autocor = NULL,
  cov_ranef = NULL,
  sample_prior = c("no", "yes", "only"),
  sparse = NULL,
  knots = NULL,
  stanvars = NULL,
  stan_funs = NULL,
  silent = 1,
  recompile = FALSE,
  combine = TRUE,
  fit = NA,
  algorithm = getOption("brms.algorithm", "sampling"),
  seed = NA,
  file = NULL,
  file_compress = TRUE,
  file_refit = getOption("brms.file_refit", "never"),
  ...
)

Arguments

`formula`	An object of class `formula`, `brmsformula`, or `mvbrmsformula` (or one that can be coerced to that classes): A symbolic description of the model to be fitted. The details of model specification are explained in `brmsformula`.
`data`	A list of data.frames each of which will be used to fit a separate model. Alternatively, a `mids` object from the mice package.
`family`	A description of the response distribution and link function to be used in the model. This can be a family function, a call to a family function or a character string naming the family. Every family function has a `link` argument allowing to specify the link function to be applied on the response variable. If not specified, default links are used. For details of supported families see `brmsfamily`. By default, a linear `gaussian` model is applied. In multivariate models, `family` might also be a list of families.
`prior`	One or more `brmsprior` objects created by `set_prior` or related functions and combined using the `c` method or the `+` operator. See also `default_prior` for more help.
`data2`	A list of named lists each of which will be used to fit a separate model. Each of the named lists contains objects representing data which cannot be passed via argument `data` (see `brm` for examples). The length of the outer list should match the length of the list passed to the `data` argument.
`autocor`	(Deprecated) An optional `cor_brms` object describing the correlation structure within the response variable (i.e., the 'autocorrelation'). See the documentation of `cor_brms` for a description of the available correlation structures. Defaults to `NULL`, corresponding to no correlations. In multivariate models, `autocor` might also be a list of autocorrelation structures. It is now recommend to specify autocorrelation terms directly within `formula`. See `brmsformula` for more details.
`cov_ranef`	(Deprecated) A list of matrices that are proportional to the (within) covariance structure of the group-level effects. The names of the matrices should correspond to columns in `data` that are used as grouping factors. All levels of the grouping factor should appear as rownames of the corresponding matrix. This argument can be used, among others to model pedigrees and phylogenetic effects. It is now recommended to specify those matrices in the formula interface using the `gr` and related functions. See `vignette("brms_phylogenetics")` for more details.
`sample_prior`	Indicate if draws from priors should be drawn additionally to the posterior draws. Options are `"no"` (the default), `"yes"`, and `"only"`. Among others, these draws can be used to calculate Bayes factors for point hypotheses via `hypothesis`. Please note that improper priors are not sampled, including the default improper priors used by `brm`. See `set_prior` on how to set (proper) priors. Please also note that prior draws for the overall intercept are not obtained by default for technical reasons. See `brmsformula` how to obtain prior draws for the intercept. If `sample_prior` is set to `"only"`, draws are drawn solely from the priors ignoring the likelihood, which allows among others to generate draws from the prior predictive distribution. In this case, all parameters must have proper priors.
`sparse`	(Deprecated) Logical; indicates whether the population-level design matrices should be treated as sparse (defaults to `FALSE`). For design matrices with many zeros, this can considerably reduce required memory. Sampling speed is currently not improved or even slightly decreased. It is now recommended to use the `sparse` argument of `brmsformula` and related functions.
`knots`	Optional list containing user specified knot values to be used for basis construction of smoothing terms. See `gamm` for more details.
`stanvars`	An optional `stanvars` object generated by function `stanvar` to define additional variables for use in Stan's program blocks.
`stan_funs`	(Deprecated) An optional character string containing self-defined Stan functions, which will be included in the functions block of the generated Stan code. It is now recommended to use the `stanvars` argument for this purpose instead.
`silent`	Verbosity level between `0` and `2`. If `1` (the default), most of the informational messages of compiler and sampler are suppressed. If `2`, even more messages are suppressed. The actual sampling progress is still printed. Set `refresh = 0` to turn this off as well. If using `backend = "rstan"` you can also set `open_progress = FALSE` to prevent opening additional progress bars.
`recompile`	Logical, indicating whether the Stan model should be recompiled for every imputed data set. Defaults to `FALSE`. If `NULL`, `brm_multiple` tries to figure out internally, if recompilation is necessary, for example because data-dependent priors have changed. Using the default of no recompilation should be fine in most cases.
`combine`	Logical; Indicates if the fitted models should be combined into a single fitted model object via `combine_models`. Defaults to `TRUE`.
`fit`	An instance of S3 class `brmsfit_multiple` derived from a previous fit; defaults to `NA`. If `fit` is of class `brmsfit_multiple`, the compiled model associated with the fitted result is re-used and all arguments modifying the model code or data are ignored. It is not recommended to use this argument directly, but to call the `update` method, instead.
`algorithm`	Character string naming the estimation approach to use. Options are `"sampling"` for MCMC (the default), `"meanfield"` for variational inference with independent normal distributions, `"fullrank"` for variational inference with a multivariate normal distribution, or `"fixed_param"` for sampling from fixed parameter values. Can be set globally for the current R session via the `"brms.algorithm"` option (see `options`).
`seed`	The seed for random number generation to make results reproducible. If `NA` (the default), Stan will set the seed randomly.
`file`	Either `NULL` or a character string. In the latter case, the fitted model object is saved via `saveRDS` in a file named after the string supplied in `file`. The `.rds` extension is added automatically. If the file already exists, `brm` will load and return the saved model object instead of refitting the model. Unless you specify the `file_refit` argument as well, the existing files won't be overwritten, you have to manually remove the file in order to refit and save the model under an existing file name. The file name is stored in the `brmsfit` object for later usage.
`file_compress`	Logical or a character string, specifying one of the compression algorithms supported by `saveRDS`. If the `file` argument is provided, this compression will be used when saving the fitted model object.
`file_refit`	Modifies when the fit stored via the `file` argument is re-used. Can be set globally for the current R session via the `"brms.file_refit"` option (see `options`). For `"never"` (default) the fit is always loaded if it exists and fitting is skipped. For `"always"` the model is always refitted. If set to `"on_change"`, brms will refit the model if model, data or algorithm as passed to Stan differ from what is stored in the file. This also covers changes in priors, `sample_prior`, `stanvars`, covariance structure, etc. If you believe there was a false positive, you can use `brmsfit_needs_refit` to see why refit is deemed necessary. Refit will not be triggered for changes in additional parameters of the fit (e.g., initial values, number of iterations, control arguments, ...). A known limitation is that a refit will be triggered if within-chain parallelization is switched on/off.
`...`	Further arguments passed to `brm`.

Details

The combined model may issue false positive convergence warnings, as the MCMC chains corresponding to different datasets may not necessarily overlap, even if each of the original models did converge. To find out whether each of the original models converged, investigate fit$rhats, where fit denotes the output of brm_multiple.

Value

If combine = TRUE a brmsfit_multiple object, which inherits from class brmsfit and behaves essentially the same. If combine = FALSE a list of brmsfit objects.

Author(s)

Paul-Christian Buerkner paul.buerkner@gmail.com

Examples

## Not run: 
library(mice)
imp <- mice(nhanes2)

# fit the model using mice and lm
fit_imp1 <- with(lm(bmi ~ age + hyp + chl), data = imp)
summary(pool(fit_imp1))

# fit the model using brms
fit_imp2 <- brm_multiple(bmi ~ age + hyp + chl, data = imp, chains = 1)
summary(fit_imp2)
plot(fit_imp2, pars = "^b_")
# investigate convergence of the original models
fit_imp2$rhats

# use the future package for parallelization
library(future)
plan(multisession, workers = 4)
fit_imp3 <- brm_multiple(bmi~age+hyp+chl, data = imp, chains = 1)
summary(fit_imp3)

## End(Not run)

[Package brms version 2.21.0 Index]