gen_D {mvGPS} | R Documentation |
Generate Bivariate Multivariate Exposure
Description
Generate exposure from a bivariate normal distribution confounded by a set of
variables C
=\(C1, C2).
Usage
gen_D(
method,
n,
rho_cond,
s_d1_cond,
s_d2_cond,
k,
C_mu,
C_cov,
C_var,
C_sigma = NULL,
d1_beta,
d2_beta,
seed = NULL
)
Arguments
method |
character value identifying which method to use when generating
bivariate exposure. Options include "matrix_normal", "uni_cond", and "vector_normal".
See details for a brief explanation of each method. |
n |
integer value total number of units |
rho_cond |
scalar value identifying conditional correlation of exposures given covariates between \[0, 1\] |
s_d1_cond |
scalar value for conditional standard deviation of |
s_d2_cond |
scalar value for conditional standard deviation of |
k |
integer value determining number of covariates to generate in |
C_mu |
numeric vector of mean values for covariates. Must be same length as |
C_cov |
scalar value representing constant correlation between covariates |
C_var |
scalar value representing constant variance of covariates |
C_sigma |
numeric matrix representing the covariance matrix of covariates.
Default is NULL and will use |
d1_beta |
numeric vector of length |
d2_beta |
numeric vector of length |
seed |
integer value setting the seed of random generator to produce repeatable results. set to NULL by default |
Details
Generating Confounders
We assume that there are a total of k
confounders that are generated
from a multivariate normal distribution with equicorrelation covariance, i.e.,
\Sigma_{C}=\phi(\mathbf{1}\mathbf{1}^{T}-\mathbf{I})+\mathbf{I}\sigma^{2}_{C},
where \mathbf{1}
is the column vector with all entries equal to 1,
\mathbf{I}
is the identity matrix, \sigma^{2}_{C}
is a constant
standard deviation for all confounders, and \phi
is the covariance of
any two confounders. Therefore, our random confounders
C
follow the distribution
\mathbf{C}\sim N_{k}(\boldsymbol{\mu}_{C}, \Sigma_{C}).
We draw a total of n
samples from this multivariate normal distribution
using mvrnorm
.
Generating Bivariate Exposure
The first step when generating the bivariate exposure is to specify the
effects of the confounders C
. We control this for each exposure value
using the arguments d1_beta
and d2_beta
such that
E[D_{1}\mid \mathbf{C}]=\boldsymbol{\beta}^{T}_{D1}\mathbf{C}
and
E[D_{2}\mid \mathbf{C}]=\boldsymbol{\beta}^{T}_{D2}\mathbf{C}
.
Note that by specifying d1_beta
and d2_beta
separately that the
user can control the amount of overlap in the confounders for each exposure,
and how many of the variables in C
are truly related to the exposures.
For instance to have the exposure have identical confounding effects
d1_beta
=d2_beta
, and they have separate confounding if there are
zero non-zero elements in common between d1_beta
and d2_beta
.
To generate the bivariate conditional distribution of exposures given the set
of confounders C
we have the following three methods:
"matrix_normal"
"uni_cond"
"vector_normal"
"matrix_normal" uses the function rmatnorm
to
generate all n
samples as
\mathbf{D}\mid\mathbf{C}\sim N_{n \times 2}(\boldsymbol{\beta}\mathbf{C}, \mathbf{I}_{n}, \Omega)
where \boldsymbol{\beta}
is a column vector containing \boldsymbol{\beta}^{T}_{D1}
and \boldsymbol{\beta}^{T}_{D2}
, and \Omega
is the conditional covariance matrix.
"vector_normal" simply vectorizes the matrix_normal method above to generate
a vector of length n \times 2
.
"uni_cond" specifies the bivariate exposure using univariate conditional factorization, which in the case of bivariate normal results in two univariate normal expressions.
In general, we suggest using the univariate conditional, "uni_cond", method when generating exposures as it is substantially faster than both the matrix normal and vector normal approaches.
Note that the options use regular expression matching and can be specified uniquely using either "m", "u", or "v".
Marginal Covariance of Exposures
As described above the exposures are drawn conditional on the set C
,
so the marginal covariance of exposures is defined as
\Sigma_{D}= \boldsymbol{\beta}\Sigma_{C}\boldsymbol{\beta}^{T}+\Omega.
In our function we return the true marginal covariance \Sigma_{D}
as well
as the true marginal correlation \rho_{D}
.
Value
-
D
: nx2 numeric matrix of the sample values for the exposures given the setC
-
C
: nxk numeric matrix of the sampled values for the confounding setC
-
D_Sigma
: 2x2 numeric matrix of the true marginal covariance of exposures -
rho
: numeric scalar representing the true marginal correlation of exposures
Examples
#generate bivariate exposures. D1 confounded by C1 and C2. D2 by C2 and C3
#uses univariate conditional normal to draw samples
sim_dt <- gen_D(method="u", n=200, rho_cond=0.2, s_d1_cond=2, s_d2_cond=2, k=3,
C_mu=rep(0, 3), C_cov=0.1, C_var=1, d1_beta=c(0.5, 1, 0), d2_beta=c(0, 0.3, 0.75), seed=06112020)
D <- sim_dt$D
C <- sim_dt$C
#observed correlation should be close to true marginal value
cor(D); sim_dt$rho
#Use vector normal method instead of univariate method to draw samples
sim_dt <- gen_D(method="v", n=200, rho_cond=0.2, s_d1_cond=2, s_d2_cond=2, k=3,
C_mu=rep(0, 3), C_cov=0.1, C_var=1, d1_beta=c(0.5, 1, 0), d2_beta=c(0, 0.3, 0.75), seed=06112020)