R: Generate data from a fitted mediation model

sim_mediation {robmed}

R Documentation

Generate data from a fitted mediation model

Description

Generate data from a fitted mediation model, using the obtained coefficient estimates as the true model coefficients for data generation.

Usage

sim_mediation(object, n, ...)

## S3 method for class 'fit_mediation'
sim_mediation(
  object,
  n = NULL,
  explanatory = c("sim", "boot"),
  errors = c("sim", "boot"),
  num_discrete = 10,
  ...
)

## S3 method for class 'test_mediation'
sim_mediation(object, n = NULL, ...)

rmediation(n, object, ...)

Arguments

`object`	an object inheriting from class `"fit_mediation"` or `"test_mediation"` containing results from (robust) mediation analysis.
`n`	an integer giving the number of observations to be generated. If `NULL` (the default), the number of observations is taken from the data set used in the fitted mediation model from `object`.
`...`	additional arguments to be passed down.
`explanatory`	a character string specifying how to generate the explanatory variables (i.e., the independent variables and additional covariates). Possible values are `"sim"` to draw each explanatory variable independently from a certain distribution (the default), or `"boot"` to bootstrap the explanatory variables from the observed data (i.e., random sampling with replacement). See ‘Details’ for more information on how the data are generated.
`errors`	a character string specifying how to generate the error terms in the linear models for the mediators and the dependent variable. Possible values are `"sim"` to draw the error terms independently from the respective fitted model distribution (the default), or `"boot"` to bootstrap the error terms from the observed residuals in the respective fitted model (i.e., random sampling with replacement). See ‘Details’ for more information on how the data are generated.
`num_discrete`	integer; if the explanatory variables are drawn from distributions (`explanatory` = "sim"), variables that take `num_discrete` or fewer values are considered discrete (the default is 10). In that case, the corresponding variables are drawn from multinomial distributions with the relative frequencies from the observed data. This is only relevant if the mediation model was fitted via regressions and ignored if the mediation model was fitted via the covariance matrix, as the latter method assumes multivariate normality.

Details

The data generating process consists of three basic steps:

Generate the explanatory variables (i.e., the independent variables and additional covariates).
Generate the error terms of the different regression models.
Generate the mediators and the dependent variable from the respective regression models, using the coefficient estimates from the fitted mediation model as the true model coefficients.

If explanatory = "sim", the explanatory variables are simulated as follows. For each variable, a regression on a constant term is performed, using the same estimator and assumed error distribution as in the fitted mediation model from object. Typically, the assumed error distribution is normal, but it can also be a skew-normal, t, or skew-t distribution, or a selection of the best-fitting error distribution. Using the obtained location estimate and parameter estimates of the assumed error distribution, values are drawn from this error distribution and added to the location estimate. It is important to note that all explanatory variables are simulated independently from each other, hence there are no correlations between the explanatory variables.

In order to generate correlated explanatory variables, it is recommended bootstrap the explanatory variables from the observed data by setting explanatory = "boot".

If errors = "sim", the error terms of the different regression models are drawn from the assumed error distribution in the fitted mediation model from object, using the respective parameter estimates. Typically, the assumed error distribution is normal, but it can also be a skew-normal, t, or skew-t distribution, or a selection of the best-fitting error distribution.

If errors = "boot", bootstrapping the error terms from the observed residuals is done independently for the different regression models and, if also explanatory = "boot", independently from bootstrapping the explanatory variables.

The "boot_test_mediation" method for results of a bootstrap test always uses the regression coefficient estimates obtained on the original data for data generation, not the bootstrap estimates. Keep in mind that all bootstrap estimates are the means of the respective bootstrap replicates. If the bootstrap estimates of the regression coefficients were used to generate the data, the true values of the indirect effects for the generated data (i.e., the products of the corresponding bootstrap coefficient estimates) would not be equal to the reported bootstrap estimates of the indirect effects in object, which could lead to confusion. For the estimates on the original data, it of course holds that the estimates of indirect effects are the products of the corresponding coefficient estimates.

Value

A data frame with n observations containing simulated data for the variables of the fitted mediation model.

Mediation models

The following mediation models are implemented. In the regression equations below, the i_j are intercepts and the e_j are random error terms.

Simple mediation model: The mediation model in its simplest form is given by the equations

M = i_1 + aX + e_1,

Y = i_2 + bM + cX + e_2,

Y = i_3 + c'X + e_3,

where Y denotes the dependent variable, X the independent variable, and M the hypothesized mediator. The main parameter of interest is the product of coefficients ab, called the indirect effect. The coefficients c and c' are called the direct and total effect, respectively.
Parallel multiple mediator model: The simple mediation model can be extended with multiple mediators M_1, \dots, M_k in the following way:

M_1 = i_1 + a_1 X + e_1,

\vdots

M_k = i_k + a_k X + e_k,

Y = i_{k+1} + b_1 M_1 + \dots + b_k M_k + c X + e_{k+1},

Y = i_{k+2} + c' X + e_{k+2}.

The main parameters of interest are the individual indirect effects a_1 b_1, \dots, a_k b_k.
Serial multiple mediator model: It differs from the parallel multiple mediator model in that it allows the hypothesized mediators M_1, \dots, M_k to influence each other in a sequential manner. It is given by the equations

M_1 = i_1 + a_1 X + e_1,

M_2 = i_1 + d_{21} M_1 + a_2 X + e_2,

\vdots

M_k = i_k + d_{k1} M_1 + \dots + d_{k,k-1} M_{k-1} + a_k X + e_k,

Y = i_{k+1} + b_1 M_1 + \dots + b_k M_k + c X + e_{k+1},

Y = i_{k+2} + c' X + e_{k+2}.

The serial multiple mediator model quickly grows in complexity with increasing number of mediators due to the combinatorial increase in indirect paths through the mediators. It is therefore only implemented for two and three mediators to maintain a focus on easily interpretable models. For two serial mediators, the three indirect effects a_1 b_1, a_2 b_2, and a_1 d_{21} b_2 are the main parameters of interest. For three serial mediators, there are already seven indirect effects: a_1 b_1, a_2 b_2, a_3 b_3, a_1 d_{21} b_2, a_1 d_{31} b_3, a_2 d_{32} b_3, and a_1 d_{21} d_{32} b_3.
Multiple independent variables to be mediated: The simple mediation model can also be extended by allowing multiple independent variables X_1, \dots, X_l instead of multiple mediators. It is defined by the equations

M = i_1 + a_1 X_1 + \dots + a_l X_l + e_1,

Y = i_2 + b M + c_1 X_1 + \dots + c_l X_l + e_2,

Y = i_3 + c_1' X_1 + \dots + c_l' X_l + e_3.

The indirect effects a_1 b, \dots, a_l b are the main parameters of interest. Note that an important special case of this model occurs when a categorical independent variable is represented by a group of dummy variables.
Control variables: To isolate the effects of the independent variables of interest from other factors, control variables can be added to all regression equations of a mediation model. Note that that there is no intrinsic difference between independent variables of interest and control variables in terms of the model or its estimation. The difference is purely conceptual in nature: for the control variables, the estimates of the direct and indirect paths are not of particular interest to the researcher. Control variables can therefore be specified separately from the independent variables of interest. Only for the latter, results for the indirect effects are included in the output.
More complex models: Some of the models described above can be combined, for instance parallel and serial multiple mediator models support multiple independent variables of interest and control variables.

Note

Function sim_mediation() takes the object containing results from mediation analysis as its first argument so that it can easily be used with the pipe operator (R's built-in |> or magrittr's %>%).

Function rmediation() is a wrapper conforming with the naming convention for functions that generate data, as well as the convention of those function to take the number of observations as the first argument.

Author(s)

Andreas Alfons

Examples

data("BSG2014")

## simple mediation
# fit the mediation model
fit_simple <- fit_mediation(BSG2014,
                            x = "ValueDiversity",
                            y = "TeamCommitment",
                            m = "TaskConflict")
# simulate data from the fitted mediation model
sim_simple <- sim_mediation(fit_simple, n = 100)
head(sim_simple)

## serial multiple mediators
# fit the mediation model
fit_serial <- fit_mediation(BSG2014,
                            x = "ValueDiversity",
                            y = "TeamScore",
                            m = c("TaskConflict",
                                  "TeamCommitment"),
                            model = "serial")
# simulate data from the fitted mediation model
sim_serial <- sim_mediation(fit_serial, n = 100)
head(sim_serial)

## parallel multiple mediators and control variables
# fit the mediation model
fit_parallel <- fit_mediation(BSG2014,
                              x = "SharedLeadership",
                              y = "TeamPerformance",
                              m = c("ProceduralJustice",
                                    "InteractionalJustice"),
                              covariates = c("AgeDiversity",
                                             "GenderDiversity"),
                              model = "parallel")
# simulate data from the fitted mediation model
# (here the explanatory variables are bootstrapped
# to maintain the correlations between them)
sim_parallel <- sim_mediation(fit_parallel, n = 100,
                              explanatory = "boot")
head(sim_parallel)

[Package robmed version 1.0.2 Index]