R: Generate Bivariate Multivariate Exposure

gen_D {mvGPS}

R Documentation

Generate Bivariate Multivariate Exposure

Description

Generate exposure from a bivariate normal distribution confounded by a set of variables C=\(C1, C2).

Usage

gen_D(
  method,
  n,
  rho_cond,
  s_d1_cond,
  s_d2_cond,
  k,
  C_mu,
  C_cov,
  C_var,
  C_sigma = NULL,
  d1_beta,
  d2_beta,
  seed = NULL
)

Arguments

`method`	character value identifying which method to use when generating bivariate exposure. Options include "matrix_normal", "uni_cond", and "vector_normal". See details for a brief explanation of each method. `uni_cond` is fastest
`n`	integer value total number of units
`rho_cond`	scalar value identifying conditional correlation of exposures given covariates between \[0, 1\]
`s_d1_cond`	scalar value for conditional standard deviation of `D1`
`s_d2_cond`	scalar value for conditional standard deviation of `D2`
`k`	integer value determining number of covariates to generate in `C`.
`C_mu`	numeric vector of mean values for covariates. Must be same length as `k`
`C_cov`	scalar value representing constant correlation between covariates
`C_var`	scalar value representing constant variance of covariates
`C_sigma`	numeric matrix representing the covariance matrix of covariates. Default is NULL and will use `C_var` and `C_var` otherwise.
`d1_beta`	numeric vector of length `k` defining the mean of `D1` with respect to the covariates
`d2_beta`	numeric vector of length `k` defining the mean of `D2` with respect to the covariates
`seed`	integer value setting the seed of random generator to produce repeatable results. set to NULL by default

Details

Generating Confounders

We assume that there are a total of k confounders that are generated from a multivariate normal distribution with equicorrelation covariance, i.e.,

\Sigma_{C}=\phi(\mathbf{1}\mathbf{1}^{T}-\mathbf{I})+\mathbf{I}\sigma^{2}_{C},

where \mathbf{1} is the column vector with all entries equal to 1, \mathbf{I} is the identity matrix, \sigma^{2}_{C} is a constant standard deviation for all confounders, and \phi is the covariance of any two confounders. Therefore, our random confounders C follow the distribution

\mathbf{C}\sim N_{k}(\boldsymbol{\mu}_{C}, \Sigma_{C}).

We draw a total of n samples from this multivariate normal distribution using mvrnorm.

Generating Bivariate Exposure

The first step when generating the bivariate exposure is to specify the effects of the confounders C. We control this for each exposure value using the arguments d1_beta and d2_beta such that

E[D_{1}\mid \mathbf{C}]=\boldsymbol{\beta}^{T}_{D1}\mathbf{C}

and

E[D_{2}\mid \mathbf{C}]=\boldsymbol{\beta}^{T}_{D2}\mathbf{C}

Note that by specifying d1_beta and d2_beta separately that the user can control the amount of overlap in the confounders for each exposure, and how many of the variables in C are truly related to the exposures. For instance to have the exposure have identical confounding effects d1_beta=d2_beta, and they have separate confounding if there are zero non-zero elements in common between d1_beta and d2_beta.

To generate the bivariate conditional distribution of exposures given the set of confounders C we have the following three methods:

"matrix_normal"
"uni_cond"
"vector_normal"

"matrix_normal" uses the function rmatnorm to generate all n samples as

\mathbf{D}\mid\mathbf{C}\sim N_{n \times 2}(\boldsymbol{\beta}\mathbf{C}, \mathbf{I}_{n}, \Omega)

where \boldsymbol{\beta} is a column vector containing \boldsymbol{\beta}^{T}_{D1} and \boldsymbol{\beta}^{T}_{D2}, and \Omega is the conditional covariance matrix.

"vector_normal" simply vectorizes the matrix_normal method above to generate a vector of length n \times 2.

"uni_cond" specifies the bivariate exposure using univariate conditional factorization, which in the case of bivariate normal results in two univariate normal expressions.

In general, we suggest using the univariate conditional, "uni_cond", method when generating exposures as it is substantially faster than both the matrix normal and vector normal approaches.

Note that the options use regular expression matching and can be specified uniquely using either "m", "u", or "v".

Marginal Covariance of Exposures

As described above the exposures are drawn conditional on the set C, so the marginal covariance of exposures is defined as

\Sigma_{D}= \boldsymbol{\beta}\Sigma_{C}\boldsymbol{\beta}^{T}+\Omega.

In our function we return the true marginal covariance \Sigma_{D} as well as the true marginal correlation \rho_{D}.

Value

D: nx2 numeric matrix of the sample values for the exposures given the set C
C: nxk numeric matrix of the sampled values for the confounding set C
D_Sigma: 2x2 numeric matrix of the true marginal covariance of exposures
rho: numeric scalar representing the true marginal correlation of exposures

Examples

#generate bivariate exposures. D1 confounded by C1 and C2. D2 by C2 and C3
#uses univariate conditional normal to draw samples
sim_dt <- gen_D(method="u", n=200, rho_cond=0.2, s_d1_cond=2, s_d2_cond=2, k=3,
C_mu=rep(0, 3), C_cov=0.1, C_var=1, d1_beta=c(0.5, 1, 0), d2_beta=c(0, 0.3, 0.75), seed=06112020)
D <- sim_dt$D
C <- sim_dt$C

#observed correlation should be close to true marginal value
cor(D); sim_dt$rho


#Use vector normal method instead of univariate method to draw samples
sim_dt <- gen_D(method="v", n=200, rho_cond=0.2, s_d1_cond=2, s_d2_cond=2, k=3,
C_mu=rep(0, 3), C_cov=0.1, C_var=1, d1_beta=c(0.5, 1, 0), d2_beta=c(0, 0.3, 0.75), seed=06112020)

[Package mvGPS version 1.2.2 Index]