data.gen {gesso} | R Documentation |
Data Generation
Description
Generates genotypes data matrix G (sample_size
by p
), vector of environmental measurments E, and an outcome vector Y of size sample_size
. Simulates training, validation, and test datasets.
Usage
data.gen(sample_size = 100, p = 20, n_g_non_zero = 15, n_gxe_non_zero = 10,
family = "gaussian", mode = "strong_hierarchical",
normalize = FALSE, normalize_response = FALSE,
seed = 1, pG = 0.2, pE = 0.3,
n_confounders = NULL)
Arguments
sample_size |
sample size of the data |
p |
total number of main effects |
n_g_non_zero |
number of non-zero main effects to generate |
n_gxe_non_zero |
number of non-zero interaction effects to generate |
family |
"gaussian" for continous outcome Y and "binomial" for binary 0/1 outcome |
mode |
either "strong_hierarchical", "hierarchical", or "anti_hierarchical". In the strong hierarchical mode the hierarchical structure is maintained (beta_g = 0 then beta_gxe = 0) and also |beta_g| >= |beta_gxe|. In the hierarchical mode the hierarchical structure is maintained, but |beta_G| < |beta_gxe|. In the anti_hierarchical mode the hierarchical structure is violated (beta_g = 0 then beta_gxe != 0). |
normalize |
|
normalize_response |
|
pG |
genotypes prevalence, value from 0 to 1 |
pE |
environment prevalence, value from 0 to 1 |
seed |
random seed |
n_confounders |
number of confounders to generate, either |
Value
A list of simulated datasets and generating coefficients
G_train , G_valid , G_test |
generated genotypes matrices |
E_train , E_valid , E_test |
generated vectors of environmental values |
Y_train , Y_valid , Y_test |
generated outcome vectors |
C_train , C_valid , C_test |
generated confounders matrices |
GxE_train , GxE_valid , GxE_test |
generated GxE matrix |
Beta_G |
main effect coefficients vector |
Beta_GxE |
interaction coefficients vector |
beta_0 |
intercept coefficient value |
beta_E |
environment coefficient value |
Beta_C |
confounders coefficient values |
index_beta_non_zero , index_beta_gxe_non_zero , index_beta_zero , index_beta_gxe_zero |
inner data generation variables |
n_g_non_zero |
number of non-zero main effects generated |
n_gxe_non_zero |
number of non-zero interactions generated |
n_total_non_zero |
total number of non-zero variables |
SNR_g |
signal-to-noise ratio for the main effects |
SNR_gxe |
signal-to-noise ratio for the interactions |
family , p , sample_size , mode , seed |
input simulation parameters |
Examples
data = data.gen(sample_size=100, p=100)
G = data$G_train; GxE = data$GxE_train
E = data$E_train; Y = data$Y_train