marginal_comparison.bgmfit {bsitar}R Documentation

Compare growth curves

Description

The marginal_comparison() function estimates and compare growth curves such as distance and velocity. This function is a wrapper around the marginaleffects::comparisons() and marginaleffects::avg_comparisons(). The marginaleffects::comparisons() computes unit-level (conditional) estimates whereas marginaleffects::avg_comparisons() return average (marginal) estimates. A detailed explanation is available here.

Usage

## S3 method for class 'bgmfit'
marginal_comparison(
  model,
  resp = NULL,
  ndraws = NULL,
  draw_ids = NULL,
  newdata = NULL,
  datagrid = NULL,
  re_formula = NA,
  allow_new_levels = FALSE,
  sample_new_levels = "gaussian",
  xrange = 1,
  digits = 2,
  numeric_cov_at = NULL,
  aux_variables = NULL,
  levels_id = NULL,
  avg_reffects = NULL,
  idata_method = NULL,
  ipts = NULL,
  seed = 123,
  future = FALSE,
  future_session = "multisession",
  cores = NULL,
  average = FALSE,
  plot = FALSE,
  showlegends = NULL,
  variables = NULL,
  deriv = NULL,
  deriv_model = NULL,
  comparison = "difference",
  type = NULL,
  by = FALSE,
  conf_level = 0.95,
  transform = NULL,
  cross = FALSE,
  wts = NULL,
  hypothesis = NULL,
  equivalence = NULL,
  eps = NULL,
  reformat = NULL,
  estimate_center = NULL,
  estimate_interval = NULL,
  dummy_to_factor = NULL,
  verbose = FALSE,
  expose_function = FALSE,
  usesavedfuns = NULL,
  clearenvfuns = NULL,
  envir = NULL,
  ...
)

marginal_comparison(model, ...)

Arguments

model

An object of class bgmfit.

resp

A character string (default NULL) to specify response variable when processing posterior draws for the univariate_by and multivariate models. See bsitar() for details on univariate_by and multivariate models

ndraws

A positive integer indicating the number of posterior draws to be used in estimation. If NULL (default), all draws are used.

draw_ids

An integer indicating the specific posterior draw(s) to be used in estimation (default NULL).

newdata

An optional data frame to be used in estimation. If NULL (default), the newdata is retrieved from the model.

datagrid

Generate a grid of user-specified values for use in the newdata argument in various functions of the marginaleffects package. This is useful to define where in the predictor space we want to evaluate the quantities of interest. See marginaleffects::datagrid() for details. The default value for the datagrid is NULL implying that no custom grid is constructed. To set a data grid, the argument should be a data.frame constructed by using the marginaleffects::datagrid() function, or else a named list which are internally used for setting up the grid. For the user convenience, we also allow setting an empty list datagrid = list() in which case essential arguments such as model, newdata are taken up from the respective arguments specified elsewhere. Further, the level 1 predictor (such as age) and any covariate included in the model fit (e.g., gender) are also automatically inferred from the model object.

re_formula

Option to indicate whether or not to include the individual/group-level effects in the estimation. When NA (default), the individual-level effects are excluded and therefore population average growth parameters are computed. When NULL, individual-level effects are included in the computation and hence the growth parameters estimates returned are individual-specific. In both situations, (i.e,, NA or NULL), continuous and factor covariate(s) are appropriately included in the estimation. The continuous covariates by default are set to their means (see numeric_cov_at for details) whereas factor covariates are left unaltered thereby allowing estimation of covariate specific population average and individual-specific growth parameter.

allow_new_levels

A flag indicating if new levels of group-level effects are allowed (defaults to FALSE). Only relevant if newdata is provided.

sample_new_levels

Indicates how to sample new levels for grouping factors specified in re_formula. This argument is only relevant if newdata is provided and allow_new_levels is set to TRUE. If "uncertainty" (default), each posterior sample for a new level is drawn from the posterior draws of a randomly chosen existing level. Each posterior sample for a new level may be drawn from a different existing level such that the resulting set of new posterior draws represents the variation across existing levels. If "gaussian", sample new levels from the (multivariate) normal distribution implied by the group-level standard deviations and correlations. This options may be useful for conducting Bayesian power analysis or predicting new levels in situations where relatively few levels where observed in the old_data. If "old_levels", directly sample new levels from the existing levels, where a new level is assigned all of the posterior draws of the same (randomly chosen) existing level.

xrange

An integer to set the predictor range (i.e., age) when executing the interpolation via ipts. The default NULL sets the individual specific predictor range whereas code xrange = 1 sets identical range for individuals within the same higher grouping variable (e.g., study). Code xrange = 2 sets the identical range across the entire sample. Lastly, a paired numeric values can be supplied e.g., xrange = c(6, 20) to set the range within those values.

digits

An integer (default 2) to set the decimal places for the estimates. The digits is passed on to the base::round() function.

numeric_cov_at

An optional (named list) argument to specify the value of continuous covariate(s). The default NULL option set the continuous covariate(s) at their mean. Alternatively, a named list can be supplied to manually set these values. For example, numeric_cov_at = list(xx = 2) will set the continuous covariate varibale 'xx' at 2. The argument numeric_cov_at is ignored when no continuous covariate is included in the model.

aux_variables

An optional argument to specify the variable(s) that can be passed to the ipts argument (see below). This is useful when fitting location scale models and measurement error models. An indication to use aux_variables is when post processing functions throw an error such as variable 'x' not found either 'data' or 'data2'

levels_id

An optional argument to specify the ids for hierarchical model (default NULL). It is used only when model is applied to the data with 3 or more levels of hierarchy. For a two level model, the levels_id is automatically inferred from the model fit. Even for 3 or higher level model, the levels_id is inferred from the model fit but under the assumption that hierarchy is specified from lowest to upper most level i.e, id followed by study where id is nested within the study Note that it is not guaranteed that the levels_id is sorted correctly, and therefore it is better to set it manually when fitting a model with three or more levels of hierarchy.

avg_reffects

An optional argument (default NULL) to calculate (marginal/average) curves and growth parameters such as APGV and PGV. If specified, it must be a named list indicating the over (typically level 1 predictor, such as age), feby (fixed effects, typically a factor variable), and reby (typically NULL indicating that parameters are integrated over the random effects) such as avg_reffects = list(feby = 'study', reby = NULL, over = 'age').

idata_method

A character string to indicate the interpolation method. The number of of interpolation points is set up the ipts argument. Options available for idata_method are method 1 (specified as 'm1') and method 2 (specified as 'm2'). The method 1 ('m1') is adapted from the the iapvbs package and is documented here https://rdrr.io/github/Zhiqiangcao/iapvbs/src/R/exdata.R whereas method 2 ('m2') is based on the JMbayes package as documented here https://github.com/drizopoulos/JMbayes/blob/master/R/dynPred_lme.R. The 'm1' method works by internally constructing the data frame based on the model configuration whereas the method 'm2' uses the exact data frame used in model fit and can be accessed via fit$data. If idata_method = NULL, default, then method 'm2' is automatically set. Note that method 'm1' might fail in some cases when model involves covariates particularly when model is fit as univariate_by. Therefore, it is advised to switch to method 'm2' in case 'm1' results in error.

ipts

An integer to set the length of the predictor variable to get a smooth velocity curve. The NULL will return original values whereas an integer such as ipts = 10 (default) will interpolate the predictor. It is important to note that these interpolations do not alter the range of predictor when calculating population average and/or the individual specific growth curves.

seed

An integer (default 123) that is passed to the estimation method.

future

A logical (default FALSE) to specify whether or not to perform parallel computations. If set to TRUE, the future.apply::future_sapply() function is used to summarize draws.

future_session

A character string to set the session type when future = TRUE. The 'multisession' (default) options sets the multisession whereas the 'multicore' sets the multicore session. Note that option 'multicore' is not supported on Windows systems. For more details, see future.apply::future_sapply().

cores

Number of cores to be used when running the parallel computations (if future = TRUE). On non-Windows systems this argument can be set globally via the mc.cores option. For the default NULL option, the number of cores are set automatically by calling the future::availableCores(). The number of cores used are the maximum number of cores avaialble minus one, i.e., future::availableCores() - 1.

average

A logical to indicate whether to internally call the marginaleffects::comparisons() or the marginaleffects::avg_comparisons() function. If FALSE (default), marginaleffects::comparisons() is called otherwise marginaleffects::avg_comparisons() when average = TRUE.

plot

A logical to specify whether to plot comparisons by calling the marginaleffects::plot_comparisons() function (FALSE) or not (FALSE). If FALSE (default), then marginaleffects::comparisons() or marginaleffects::avg_comparisons() are called to compute predictions (see average for details)

showlegends

An argument to specify whether to show legends (TRUE) or not (FALSE). If NULL (default), then showlegends is internally set to TRUE if re_formula = NA, and FALSE if re_formula = NULL.

variables

For estimating growth parameters in the current use case, the variables is the level 1 predictor such as age/time. The variables is a named list where value is set via the esp argument (default 1e-6). If NULL, the variables is set internally by retrieving the relevant information from the model. Otherwise, user can define it as follows: variables = list('x' = 1e-6) where 'x' is the level 1 predictor. Note that variables = list('age' = 1e-6) is the default behavior for the marginaleffects because velocity is typically calculated by differentiating the distance curve via dydx approach, and therefore argument deriv is automatically set as 0 and deriv_model as FALSE. If user want to estimate parameters based on the model based first derivative, then argument deriv must be set as 1 and internally argument variables is defined as variables = list('age' = 0) i.e, original level 1 predictor variable, 'x'. It is important to consider that if default behavior is used i.e, deriv = 0 and variables = list('x' = 1e-6), then user can not pass additional arguments to the variables argument. On the other hand, alternative approach i.e, deriv = 0 and variables = list('x' = 0), additional options can be passed to the marginaleffects::comparisons() and marginaleffects::avg_comparisons() functions.

deriv

A numeric to specify whether to estimate parameters based on the differentiation of the distance curve or the model based first derivative. Please see argument variables for more details.

deriv_model

A logical to specify whether to estimate velocity curve from the derivative function, or the differentiation of the distance curve. The argument deriv_model is set to TRUE for those functions which need velocity curve such as growthparameters() and plot_curves(), and NULL for functions which explicitly use the distance curve (i.e., fitted values) such as loo_validation() and plot_ppc().

comparison

For estimating growth parameters in the current use case, options allowed for the comparison are 'difference' and 'differenceavg'. Note that comparison is a placeholder and is only used to setup the the internal function that estimates 'parameter' via sitar::getPeak(), sitar::getTakeoff() and sitar::getTrough() functions to estimate various growth parameters. Options 'difference' and 'differenceavg' are internally restructured according to the user specified hypothesis argument.

type

string indicates the type (scale) of the predictions used to compute contrasts or slopes. This can differ based on the model type, but will typically be a string such as: "response", "link", "probs", or "zero". When an unsupported string is entered, the model-specific list of acceptable values is returned in an error message. When type is NULL, the first entry in the error message is used by default.

by

Aggregate unit-level estimates (aka, marginalize, average over). Valid inputs:

  • FALSE: return the original unit-level estimates.

  • TRUE: aggregate estimates for each term.

  • Character vector of column names in newdata or in the data frame produced by calling the function without the by argument.

  • Data frame with a by column of group labels, and merging columns shared by newdata or the data frame produced by calling the same function without the by argument.

  • See examples below.

  • For more complex aggregations, you can use the FUN argument of the hypotheses() function. See that function's documentation and the Hypothesis Test vignettes on the marginaleffects website.

conf_level

numeric value between 0 and 1. Confidence level to use to build a confidence interval.

transform

string or function. Transformation applied to unit-level estimates and confidence intervals just before the function returns results. Functions must accept a vector and return a vector of the same length. Support string shortcuts: "exp", "ln"

cross
  • FALSE: Contrasts represent the change in adjusted predictions when one predictor changes and all other variables are held constant.

  • TRUE: Contrasts represent the changes in adjusted predictions when all the predictors specified in the variables argument are manipulated simultaneously (a "cross-contrast").

wts

string or numeric: weights to use when computing average contrasts or slopes. These weights only affect the averaging in ⁠avg_*()⁠ or with the by argument, and not the unit-level estimates themselves. Internally, estimates and weights are passed to the weighted.mean() function.

  • string: column name of the weights variable in newdata. When supplying a column name to wts, it is recommended to supply the original data (including the weights variable) explicitly to newdata.

  • numeric: vector of length equal to the number of rows in the original data or in newdata (if supplied).

hypothesis

specify a hypothesis test or custom contrast using a numeric value, vector, or matrix, a string, or a string formula.

  • Numeric:

    • Single value: the null hypothesis used in the computation of Z and p (before applying transform).

    • Vector: Weights to compute a linear combination of (custom contrast between) estimates. Length equal to the number of rows generated by the same function call, but without the hypothesis argument.

    • Matrix: Each column is a vector of weights, as describe above, used to compute a distinct linear combination of (contrast between) estimates. The column names of the matrix are used as labels in the output.

  • String formula to specify linear or non-linear hypothesis tests. If the term column uniquely identifies rows, terms can be used in the formula. Otherwise, use b1, b2, etc. to identify the position of each parameter. The ⁠b*⁠ wildcard can be used to test hypotheses on all estimates. Examples:

    • hp = drat

    • hp + drat = 12

    • b1 + b2 + b3 = 0

    • ⁠b* / b1 = 1⁠

  • String:

    • "pairwise": pairwise differences between estimates in each row.

    • "reference": differences between the estimates in each row and the estimate in the first row.

    • "sequential": difference between an estimate and the estimate in the next row.

    • "revpairwise", "revreference", "revsequential": inverse of the corresponding hypotheses, as described above.

  • See the Examples section below and the vignette: https://marginaleffects.com/vignettes/hypothesis.html

equivalence

Numeric vector of length 2: bounds used for the two-one-sided test (TOST) of equivalence, and for the non-inferiority and non-superiority tests. See Details section below.

eps

NULL or numeric value which determines the step size to use when calculating numerical derivatives: (f(x+eps)-f(x))/eps. When eps is NULL, the step size is 0.0001 multiplied by the difference between the maximum and minimum values of the variable with respect to which we are taking the derivative. Changing eps may be necessary to avoid numerical problems in certain models.

reformat

A logical (default TRUE) to reformat the output returned by the marginaleffects as a data.frame with column names re-defined as follows: conf.low as Q2.5, and conf.high as Q97.5 (assuming that conf_int = 0.95). Also, following columns are dropped from the data frame: term, contrast, tmp_idx, predicted_lo, predicted_hi, predicted.

estimate_center

A character string (default NULL) to specify whether to center estimate as 'mean' or as 'median'. Note that estimate_center is used to set the global options as follows:
options("marginaleffects_posterior_center" = "mean"), or
options("marginaleffects_posterior_center" = "median")
The pre-specified global options are restored on exit via the base::on.exit().

estimate_interval

A character string (default NULL) to specify whether to compute credible intervals as equal-tailed intervals, 'eti' or highest density intervals, 'hdi'. Note that estimate_interval is used to set the global options as follows:
options("marginaleffects_posterior_interval" = "eti"), or
options("marginaleffects_posterior_interval" = "hdi")
The pre-specified global options are restored on exit via the base::on.exit().

dummy_to_factor

A named list (default NULL) that is used to convert dummy variables into a factor variable. The named elements are factor.dummy, factor.name, and factor.level. The factor.dummy is a vector of character strings that need to be converted to a factor variable whereas the factor.name is a single character string that is used to name the newly created factor variable. The factor.level is used to name the levels of newly created factor. When factor.name is NULL, then the factor name is internally set as 'factor.var'. If factor.level is NULL, then names of factor levels are take from the factor.dummy i.e., the factor levels are assigned same name as factor.dummy. Note that when factor.level is not NULL, its length must be same as the length of the factor.dummy.

verbose

An optional argument (logical, default FALSE) to indicate whether to print information collected during setting up the object(s).

expose_function

An optional logical argument to indicate whether to expose Stan functions (default FALSE). Note that if user has already exposed Stan functions during model fit by setting expose_function = TRUE in the bsitar(), then those exposed functions are saved and can be used during post processing of the posterior draws and therefore expose_function is by default set as FALSE in all post processing functions except optimize_model(). For optimize_model(), the default setting is expose_function = NULL. The reason is that each optimized model has different Stan function and therefore it need to be re exposed and saved. The expose_function = NULL implies that the setting for expose_function is taken from the original model fit. Note that expose_function must be set to TRUE when adding fit criteria and/or bayes_R2 during model optimization.

usesavedfuns

A logical (default NULL) to indicate whether to use the already exposed and saved Stan functions. Depending on whether the user have exposed Stan functions within the bsitar() call via expose_functions argument in the bsitar(), the usesavedfuns is automatically set to TRUE (if expose_functions = TRUE) or FALSE (if expose_functions = FALSE). Therefore, manual setting of usesavedfuns as TRUE/FALSE is rarely needed. This is for internal purposes only and mainly used during the testing of the functions and therefore should not be used by users as it might lead to unreliable estimates.

clearenvfuns

A logical to indicate whether to clear the exposed function from the environment (TRUE) or not (FALSE). If NULL (default), then clearenvfuns is set as TRUE when usesavedfuns is TRUE, and FALSE if usesavedfuns is FALSE.

envir

Environment used for function evaluation. The default is NULL which will set parent.frame() as default environment. Note that since most of post processing functions are based on brms, the functions needed for evaluation should be in the .GlobalEnv. Therefore, it is strongly recommended to set envir = globalenv() (or envir = .GlobalEnv). This is particularly true for the derivatives such as velocity curve.

...

Further arguments passed to brms::fitted.brmsfit() and brms::predict() functions.

Value

A data frame objects with estimates and CIs for computed parameter(s)

Author(s)

Satpal Sandhu satpal.sandhu@bristol.ac.uk

References

There are no references for Rd macro ⁠\insertAllCites⁠ on this help page.

See Also

marginaleffects::comparisons() marginaleffects::avg_comparisons() marginaleffects::plot_comparisons()

Examples



# Fit Bayesian SITAR model 

# To avoid mode estimation which takes time, the Bayesian SITAR model fit to 
# the 'berkeley_exdata' has been saved as an example fit ('berkeley_exfit').
# See 'bsitar' function for details on 'berkeley_exdata' and 'berkeley_exfit'.

# Check and confirm whether model fit object 'berkeley_exfit' exists
 berkeley_exfit <- getNsObject(berkeley_exfit)

model <- berkeley_exfit

marginal_comparison(model, parameter = 'apgv', draw_ids = 1)



[Package bsitar version 0.2.1 Index]