gen_D {mvGPS}R Documentation

Generate Bivariate Multivariate Exposure


Generate exposure from a bivariate normal distribution confounded by a set of variables C=\(C1, C2).


  C_sigma = NULL,
  seed = NULL



character value identifying which method to use when generating bivariate exposure. Options include "matrix_normal", "uni_cond", and "vector_normal". See details for a brief explanation of each method. uni_cond is fastest


integer value total number of units


scalar value identifying conditional correlation of exposures given covariates between \[0, 1\]


scalar value for conditional standard deviation of D1


scalar value for conditional standard deviation of D2


integer value determining number of covariates to generate in C.


numeric vector of mean values for covariates. Must be same length as k


scalar value representing constant correlation between covariates


scalar value representing constant variance of covariates


numeric matrix representing the covariance matrix of covariates. Default is NULL and will use C_var and C_var otherwise.


numeric vector of length k defining the mean of D1 with respect to the covariates


numeric vector of length k defining the mean of D2 with respect to the covariates


integer value setting the seed of random generator to produce repeatable results. set to NULL by default


Generating Confounders

We assume that there are a total of k confounders that are generated from a multivariate normal distribution with equicorrelation covariance, i.e.,


where \mathbf{1} is the column vector with all entries equal to 1, \mathbf{I} is the identity matrix, \sigma^{2}_{C} is a constant standard deviation for all confounders, and \phi is the covariance of any two confounders. Therefore, our random confounders C follow the distribution

\mathbf{C}\sim N_{k}(\boldsymbol{\mu}_{C}, \Sigma_{C}).

We draw a total of n samples from this multivariate normal distribution using mvrnorm.

Generating Bivariate Exposure

The first step when generating the bivariate exposure is to specify the effects of the confounders C. We control this for each exposure value using the arguments d1_beta and d2_beta such that

E[D_{1}\mid \mathbf{C}]=\boldsymbol{\beta}^{T}_{D1}\mathbf{C}


E[D_{2}\mid \mathbf{C}]=\boldsymbol{\beta}^{T}_{D2}\mathbf{C}


Note that by specifying d1_beta and d2_beta separately that the user can control the amount of overlap in the confounders for each exposure, and how many of the variables in C are truly related to the exposures. For instance to have the exposure have identical confounding effects d1_beta=d2_beta, and they have separate confounding if there are zero non-zero elements in common between d1_beta and d2_beta.

To generate the bivariate conditional distribution of exposures given the set of confounders C we have the following three methods:

"matrix_normal" uses the function rmatnorm to generate all n samples as

\mathbf{D}\mid\mathbf{C}\sim N_{n \times 2}(\boldsymbol{\beta}\mathbf{C}, \mathbf{I}_{n}, \Omega)

where \boldsymbol{\beta} is a column vector containing \boldsymbol{\beta}^{T}_{D1} and \boldsymbol{\beta}^{T}_{D2}, and \Omega is the conditional covariance matrix.

"vector_normal" simply vectorizes the matrix_normal method above to generate a vector of length n \times 2.

"uni_cond" specifies the bivariate exposure using univariate conditional factorization, which in the case of bivariate normal results in two univariate normal expressions.

In general, we suggest using the univariate conditional, "uni_cond", method when generating exposures as it is substantially faster than both the matrix normal and vector normal approaches.

Note that the options use regular expression matching and can be specified uniquely using either "m", "u", or "v".

Marginal Covariance of Exposures

As described above the exposures are drawn conditional on the set C, so the marginal covariance of exposures is defined as

\Sigma_{D}= \boldsymbol{\beta}\Sigma_{C}\boldsymbol{\beta}^{T}+\Omega.

In our function we return the true marginal covariance \Sigma_{D} as well as the true marginal correlation \rho_{D}.



#generate bivariate exposures. D1 confounded by C1 and C2. D2 by C2 and C3
#uses univariate conditional normal to draw samples
sim_dt <- gen_D(method="u", n=200, rho_cond=0.2, s_d1_cond=2, s_d2_cond=2, k=3,
C_mu=rep(0, 3), C_cov=0.1, C_var=1, d1_beta=c(0.5, 1, 0), d2_beta=c(0, 0.3, 0.75), seed=06112020)
D <- sim_dt$D
C <- sim_dt$C

#observed correlation should be close to true marginal value
cor(D); sim_dt$rho

#Use vector normal method instead of univariate method to draw samples
sim_dt <- gen_D(method="v", n=200, rho_cond=0.2, s_d1_cond=2, s_d2_cond=2, k=3,
C_mu=rep(0, 3), C_cov=0.1, C_var=1, d1_beta=c(0.5, 1, 0), d2_beta=c(0, 0.3, 0.75), seed=06112020)

[Package mvGPS version 1.2.2 Index]