predict_draws.bgmfit {bsitar}R Documentation

Predicted values from the posterior predictive distribution

Description

The predict_draws() is a wrapper around the brms::predict.brmsfit() function to obtain predicted values (and their summary) from the posterior distribution. See brms::predict.brmsfit() for details.

Usage

## S3 method for class 'bgmfit'
predict_draws(
  model,
  newdata = NULL,
  resp = NULL,
  ndraws = NULL,
  draw_ids = NULL,
  re_formula = NA,
  allow_new_levels = FALSE,
  sample_new_levels = "uncertainty",
  incl_autocor = TRUE,
  numeric_cov_at = NULL,
  levels_id = NULL,
  avg_reffects = NULL,
  aux_variables = NULL,
  ipts = 10,
  deriv = 0,
  deriv_model = TRUE,
  summary = TRUE,
  robust = FALSE,
  probs = c(0.025, 0.975),
  xrange = NULL,
  xrange_search = NULL,
  parms_eval = FALSE,
  parms_method = "getPeak",
  idata_method = NULL,
  verbose = FALSE,
  fullframe = NULL,
  dummy_to_factor = NULL,
  expose_function = FALSE,
  usesavedfuns = NULL,
  clearenvfuns = NULL,
  envir = NULL,
  ...
)

predict_draws(model, ...)

Arguments

model

An object of class bgmfit.

newdata

An optional data frame to be used in estimation. If NULL (default), the newdata is retrieved from the model.

resp

A character string (default NULL) to specify response variable when processing posterior draws for the univariate_by and multivariate models. See bsitar() for details on univariate_by and multivariate models

ndraws

A positive integer indicating the number of posterior draws to be used in estimation. If NULL (default), all draws are used.

draw_ids

An integer indicating the specific posterior draw(s) to be used in estimation (default NULL).

re_formula

Option to indicate whether or not to include the individual/group-level effects in the estimation. When NA (default), the individual-level effects are excluded and therefore population average growth parameters are computed. When NULL, individual-level effects are included in the computation and hence the growth parameters estimates returned are individual-specific. In both situations, (i.e,, NA or NULL), continuous and factor covariate(s) are appropriately included in the estimation. The continuous covariates by default are set to their means (see numeric_cov_at for details) whereas factor covariates are left unaltered thereby allowing estimation of covariate specific population average and individual-specific growth parameter.

allow_new_levels

A flag indicating if new levels of group-level effects are allowed (defaults to FALSE). Only relevant if newdata is provided.

sample_new_levels

Indicates how to sample new levels for grouping factors specified in re_formula. This argument is only relevant if newdata is provided and allow_new_levels is set to TRUE. If "uncertainty" (default), each posterior sample for a new level is drawn from the posterior draws of a randomly chosen existing level. Each posterior sample for a new level may be drawn from a different existing level such that the resulting set of new posterior draws represents the variation across existing levels. If "gaussian", sample new levels from the (multivariate) normal distribution implied by the group-level standard deviations and correlations. This options may be useful for conducting Bayesian power analysis or predicting new levels in situations where relatively few levels where observed in the old_data. If "old_levels", directly sample new levels from the existing levels, where a new level is assigned all of the posterior draws of the same (randomly chosen) existing level.

incl_autocor

A flag indicating if correlation structures originally specified via autocor should be included in the predictions. Defaults to TRUE.

numeric_cov_at

An optional (named list) argument to specify the value of continuous covariate(s). The default NULL option set the continuous covariate(s) at their mean. Alternatively, a named list can be supplied to manually set these values. For example, numeric_cov_at = list(xx = 2) will set the continuous covariate varibale 'xx' at 2. The argument numeric_cov_at is ignored when no continuous covariate is included in the model.

levels_id

An optional argument to specify the ids for hierarchical model (default NULL). It is used only when model is applied to the data with 3 or more levels of hierarchy. For a two level model, the levels_id is automatically inferred from the model fit. Even for 3 or higher level model, the levels_id is inferred from the model fit but under the assumption that hierarchy is specified from lowest to upper most level i.e, id followed by study where id is nested within the study Note that it is not guaranteed that the levels_id is sorted correctly, and therefore it is better to set it manually when fitting a model with three or more levels of hierarchy.

avg_reffects

An optional argument (default NULL) to calculate (marginal/average) curves and growth parameters such as APGV and PGV. If specified, it must be a named list indicating the over (typically level 1 predictor, such as age), feby (fixed effects, typically a factor variable), and reby (typically NULL indicating that parameters are integrated over the random effects) such as avg_reffects = list(feby = 'study', reby = NULL, over = 'age').

aux_variables

An optional argument to specify the variable(s) that can be passed to the ipts argument (see below). This is useful when fitting location scale models and measurement error models. An indication to use aux_variables is when post processing functions throw an error such as variable 'x' not found either 'data' or 'data2'

ipts

An integer to set the length of the predictor variable to get a smooth velocity curve. The NULL will return original values whereas an integer such as ipts = 10 (default) will interpolate the predictor. It is important to note that these interpolations do not alter the range of predictor when calculating population average and/or the individual specific growth curves.

deriv

An integer to indicate whether to estimate distance curve or its derivative (i.e., velocity curve). The deriv = 0 (default) is for the distance curve whereas deriv = 1 for the velocity curve.

deriv_model

A logical to specify whether to estimate velocity curve from the derivative function, or the differentiation of the distance curve. The argument deriv_model is set to TRUE for those functions which need velocity curve such as growthparameters() and plot_curves(), and NULL for functions which explicitly use the distance curve (i.e., fitted values) such as loo_validation() and plot_ppc().

summary

A logical indicating whether only the estimate should be computed (TRUE, default), or estimate along with SE and CI should be returned (FALSE). Setting summary as FALSE will increase the computation time.

robust

A logical to specify the summarize options. If FALSE (the default) the mean is used as the measure of central tendency and the standard deviation as the measure of variability. If TRUE, the median and the median absolute deviation (MAD) are applied instead. Ignored if summary is FALSE.

probs

The percentiles to be computed by the quantile function. Only used if summary is TRUE.

xrange

An integer to set the predictor range (i.e., age) when executing the interpolation via ipts. The default NULL sets the individual specific predictor range whereas code xrange = 1 sets identical range for individuals within the same higher grouping variable (e.g., study). Code xrange = 2 sets the identical range across the entire sample. Lastly, a paired numeric values can be supplied e.g., xrange = c(6, 20) to set the range within those values.

xrange_search

A vector of length two, or a character string 'range' to set the range of predictor variable (x ) within which growth parameters are searched. This is useful when there is more than one peak and user wants to summarize peak within a given range of the x variable. Default xrange_search = NULL.

parms_eval

A logical to specify whether or not to get growth parameters on the fly. This is for internal use only and mainly needed for compatibility across internal functions.

parms_method

A character to specify the method used to when evaluating parms_eval. The default is getPeak which uses the sitar::getPeak() function from the sitar package. The alternative option is findpeaks that uses the pracma::findpeaks() function function from the pracma package. This is for internal use only and mainly needed for compatibility across internal functions.

idata_method

A character string to indicate the interpolation method. The number of of interpolation points is set up the ipts argument. Options available for idata_method are method 1 (specified as 'm1') and method 2 (specified as 'm2'). The method 1 ('m1') is adapted from the the iapvbs package and is documented here https://rdrr.io/github/Zhiqiangcao/iapvbs/src/R/exdata.R whereas method 2 ('m2') is based on the JMbayes package as documented here https://github.com/drizopoulos/JMbayes/blob/master/R/dynPred_lme.R. The 'm1' method works by internally constructing the data frame based on the model configuration whereas the method 'm2' uses the exact data frame used in model fit and can be accessed via fit$data. If idata_method = NULL, default, then method 'm2' is automatically set. Note that method 'm1' might fail in some cases when model involves covariates particularly when model is fit as univariate_by. Therefore, it is advised to switch to method 'm2' in case 'm1' results in error.

verbose

An optional argument (logical, default FALSE) to indicate whether to print information collected during setting up the object(s).

fullframe

A logical to indicate whether to return fullframe object in which newdata is bind to the summary estimates. Note that fullframe can not be combined with summary = FALSE. Furthermore, fullframe can only be used when idata_method = 'm2'. A particular use case is when fitting univariate_by model. The fullframe is mainly for internal use only.

dummy_to_factor

A named list (default NULL) that is used to convert dummy variables into a factor variable. The named elements are factor.dummy, factor.name, and factor.level. The factor.dummy is a vector of character strings that need to be converted to a factor variable whereas the factor.name is a single character string that is used to name the newly created factor variable. The factor.level is used to name the levels of newly created factor. When factor.name is NULL, then the factor name is internally set as 'factor.var'. If factor.level is NULL, then names of factor levels are take from the factor.dummy i.e., the factor levels are assigned same name as factor.dummy. Note that when factor.level is not NULL, its length must be same as the length of the factor.dummy.

expose_function

An optional logical argument to indicate whether to expose Stan functions (default FALSE). Note that if user has already exposed Stan functions during model fit by setting expose_function = TRUE in the bsitar(), then those exposed functions are saved and can be used during post processing of the posterior draws and therefore expose_function is by default set as FALSE in all post processing functions except optimize_model(). For optimize_model(), the default setting is expose_function = NULL. The reason is that each optimized model has different Stan function and therefore it need to be re exposed and saved. The expose_function = NULL implies that the setting for expose_function is taken from the original model fit. Note that expose_function must be set to TRUE when adding fit criteria and/or bayes_R2 during model optimization.

usesavedfuns

A logical (default NULL) to indicate whether to use the already exposed and saved Stan functions. Depending on whether the user have exposed Stan functions within the bsitar() call via expose_functions argument in the bsitar(), the usesavedfuns is automatically set to TRUE (if expose_functions = TRUE) or FALSE (if expose_functions = FALSE). Therefore, manual setting of usesavedfuns as TRUE/FALSE is rarely needed. This is for internal purposes only and mainly used during the testing of the functions and therefore should not be used by users as it might lead to unreliable estimates.

clearenvfuns

A logical to indicate whether to clear the exposed function from the environment (TRUE) or not (FALSE). If NULL (default), then clearenvfuns is set as TRUE when usesavedfuns is TRUE, and FALSE if usesavedfuns is FALSE.

envir

Environment used for function evaluation. The default is NULL which will set parent.frame() as default environment. Note that since most of post processing functions are based on brms, the functions needed for evaluation should be in the .GlobalEnv. Therefore, it is strongly recommended to set envir = globalenv() (or envir = .GlobalEnv). This is particularly true for the derivatives such as velocity curve.

...

Additional arguments passed to the brms::predict.brmsfit() function. Please see brms::predict.brmsfit() for details on various options available.

Details

The predict_draws() function computed the fitted values from the posterior distribution. The brms::predict.brmsfit() function from the brms package can used to get the predicted (distance) values when outcome (e.g., height) is untransformed. However, when the outcome is log or square root transformed, the brms::predict.brmsfit() function will return the fitted curve on the log or square root scale whereas the predict_draws() function returns the fitted values on the original scale. Furthermore, the predict_draws() also compute the first derivative of (velocity) that too on the original scale after making required back-transformation. Except for these differences, both these functions (i.e., brms::predict.brmsfit() and predict_draws()) work in the same manner. In other words, user can specify all the options available in the brms::predict.brmsfit().

Value

An array of predicted response values. See brms::predict.brmsfit() for details.

Author(s)

Satpal Sandhu satpal.sandhu@bristol.ac.uk

See Also

brms::predict.brmsfit()

Examples


# Fit Bayesian SITAR model 

# To avoid mode estimation which takes time, the Bayesian SITAR model fit to 
# the 'berkeley_exdata' has been saved as an example fit ('berkeley_exfit').
# See 'bsitar' function for details on 'berkeley_exdata' and 'berkeley_exfit'.

# Check and confirm whether model fit object 'berkeley_exfit' exists
 berkeley_exfit <- getNsObject(berkeley_exfit)

model <- berkeley_exfit

# Population average distance curve
predict_draws(model, deriv = 0, re_formula = NA)


# Individual-specific distance curves
predict_draws(model, deriv = 0, re_formula = NULL)

# Population average velocity curve
predict_draws(model, deriv = 1, re_formula = NA)

# Individual-specific velocity curves
predict_draws(model, deriv = 1, re_formula = NULL)
 


[Package bsitar version 0.2.1 Index]