sim_mediation {robmed} | R Documentation |
Generate data from a fitted mediation model
Description
Generate data from a fitted mediation model, using the obtained coefficient estimates as the true model coefficients for data generation.
Usage
sim_mediation(object, n, ...)
## S3 method for class 'fit_mediation'
sim_mediation(
object,
n = NULL,
explanatory = c("sim", "boot"),
errors = c("sim", "boot"),
num_discrete = 10,
...
)
## S3 method for class 'test_mediation'
sim_mediation(object, n = NULL, ...)
rmediation(n, object, ...)
Arguments
object |
an object inheriting from class |
n |
an integer giving the number of observations to be generated. If
|
... |
additional arguments to be passed down. |
explanatory |
a character string specifying how to generate the
explanatory variables (i.e., the independent variables and additional
covariates). Possible values are |
errors |
a character string specifying how to generate the error terms
in the linear models for the mediators and the dependent variable. Possible
values are |
num_discrete |
integer; if the explanatory variables are drawn
from distributions ( |
Details
The data generating process consists of three basic steps:
Generate the explanatory variables (i.e., the independent variables and additional covariates).
Generate the error terms of the different regression models.
Generate the mediators and the dependent variable from the respective regression models, using the coefficient estimates from the fitted mediation model as the true model coefficients.
If explanatory = "sim"
, the explanatory variables are simulated as
follows. For each variable, a regression on a constant term is performed,
using the same estimator and assumed error distribution as in the fitted
mediation model from object
. Typically, the assumed error
distribution is normal, but it can also be a skew-normal, t
, or
skew-t
distribution, or a selection of the best-fitting error
distribution. Using the obtained location estimate and parameter estimates
of the assumed error distribution, values are drawn from this error
distribution and added to the location estimate. It is important to note
that all explanatory variables are simulated independently from each other,
hence there are no correlations between the explanatory variables.
In order to generate correlated explanatory variables, it is recommended
bootstrap the explanatory variables from the observed data by setting
explanatory = "boot"
.
If errors = "sim"
, the error terms of the different regression models
are drawn from the assumed error distribution in the fitted mediation model
from object
, using the respective parameter estimates. Typically,
the assumed error distribution is normal, but it can also be a skew-normal,
t
, or skew-t
distribution, or a selection of the best-fitting
error distribution.
If errors = "boot"
, bootstrapping the error terms from the observed
residuals is done independently for the different regression models and,
if also explanatory = "boot"
, independently from bootstrapping the
explanatory variables.
The "boot_test_mediation"
method for results of a bootstrap test
always uses the regression coefficient estimates obtained on the original
data for data generation, not the bootstrap estimates. Keep in mind that
all bootstrap estimates are the means of the respective bootstrap
replicates. If the bootstrap estimates of the regression coefficients were
used to generate the data, the true values of the indirect effects for the
generated data (i.e., the products of the corresponding bootstrap
coefficient estimates) would not be equal to the reported bootstrap
estimates of the indirect effects in object
, which could lead to
confusion. For the estimates on the original data, it of course holds that
the estimates of indirect effects are the products of the corresponding
coefficient estimates.
Value
A data frame with n
observations containing simulated data
for the variables of the fitted mediation model.
Mediation models
The following mediation models are implemented. In the regression equations
below, the i_j
are intercepts and the e_j
are random error terms.
Simple mediation model: The mediation model in its simplest form is given by the equations
M = i_1 + aX + e_1,
Y = i_2 + bM + cX + e_2,
Y = i_3 + c'X + e_3,
where
Y
denotes the dependent variable,X
the independent variable, andM
the hypothesized mediator. The main parameter of interest is the product of coefficientsab
, called the indirect effect. The coefficientsc
andc'
are called the direct and total effect, respectively.Parallel multiple mediator model: The simple mediation model can be extended with multiple mediators
M_1, \dots, M_k
in the following way:M_1 = i_1 + a_1 X + e_1,
\vdots
M_k = i_k + a_k X + e_k,
Y = i_{k+1} + b_1 M_1 + \dots + b_k M_k + c X + e_{k+1},
Y = i_{k+2} + c' X + e_{k+2}.
The main parameters of interest are the individual indirect effects
a_1 b_1, \dots, a_k b_k
.Serial multiple mediator model: It differs from the parallel multiple mediator model in that it allows the hypothesized mediators
M_1, \dots, M_k
to influence each other in a sequential manner. It is given by the equationsM_1 = i_1 + a_1 X + e_1,
M_2 = i_1 + d_{21} M_1 + a_2 X + e_2,
\vdots
M_k = i_k + d_{k1} M_1 + \dots + d_{k,k-1} M_{k-1} + a_k X + e_k,
Y = i_{k+1} + b_1 M_1 + \dots + b_k M_k + c X + e_{k+1},
Y = i_{k+2} + c' X + e_{k+2}.
The serial multiple mediator model quickly grows in complexity with increasing number of mediators due to the combinatorial increase in indirect paths through the mediators. It is therefore only implemented for two and three mediators to maintain a focus on easily interpretable models. For two serial mediators, the three indirect effects
a_1 b_1
,a_2 b_2
, anda_1 d_{21} b_2
are the main parameters of interest. For three serial mediators, there are already seven indirect effects:a_1 b_1
,a_2 b_2
,a_3 b_3
,a_1 d_{21} b_2
,a_1 d_{31} b_3
,a_2 d_{32} b_3
, anda_1 d_{21} d_{32} b_3
.Multiple independent variables to be mediated: The simple mediation model can also be extended by allowing multiple independent variables
X_1, \dots, X_l
instead of multiple mediators. It is defined by the equationsM = i_1 + a_1 X_1 + \dots + a_l X_l + e_1,
Y = i_2 + b M + c_1 X_1 + \dots + c_l X_l + e_2,
Y = i_3 + c_1' X_1 + \dots + c_l' X_l + e_3.
The indirect effects
a_1 b, \dots, a_l b
are the main parameters of interest. Note that an important special case of this model occurs when a categorical independent variable is represented by a group of dummy variables.Control variables: To isolate the effects of the independent variables of interest from other factors, control variables can be added to all regression equations of a mediation model. Note that that there is no intrinsic difference between independent variables of interest and control variables in terms of the model or its estimation. The difference is purely conceptual in nature: for the control variables, the estimates of the direct and indirect paths are not of particular interest to the researcher. Control variables can therefore be specified separately from the independent variables of interest. Only for the latter, results for the indirect effects are included in the output.
More complex models: Some of the models described above can be combined, for instance parallel and serial multiple mediator models support multiple independent variables of interest and control variables.
Note
Function sim_mediation()
takes the object containing results from
mediation analysis as its first argument so that it can easily be used with
the pipe operator (R's built-in |>
or magrittr's %>%
).
Function rmediation()
is a wrapper conforming with the naming
convention for functions that generate data, as well as the convention of
those function to take the number of observations as the first argument.
Author(s)
Andreas Alfons
See Also
fit_mediation()
, test_mediation()
Examples
data("BSG2014")
## simple mediation
# fit the mediation model
fit_simple <- fit_mediation(BSG2014,
x = "ValueDiversity",
y = "TeamCommitment",
m = "TaskConflict")
# simulate data from the fitted mediation model
sim_simple <- sim_mediation(fit_simple, n = 100)
head(sim_simple)
## serial multiple mediators
# fit the mediation model
fit_serial <- fit_mediation(BSG2014,
x = "ValueDiversity",
y = "TeamScore",
m = c("TaskConflict",
"TeamCommitment"),
model = "serial")
# simulate data from the fitted mediation model
sim_serial <- sim_mediation(fit_serial, n = 100)
head(sim_serial)
## parallel multiple mediators and control variables
# fit the mediation model
fit_parallel <- fit_mediation(BSG2014,
x = "SharedLeadership",
y = "TeamPerformance",
m = c("ProceduralJustice",
"InteractionalJustice"),
covariates = c("AgeDiversity",
"GenderDiversity"),
model = "parallel")
# simulate data from the fitted mediation model
# (here the explanatory variables are bootstrapped
# to maintain the correlations between them)
sim_parallel <- sim_mediation(fit_parallel, n = 100,
explanatory = "boot")
head(sim_parallel)