generate_phenodata {CJAMP}R Documentation

Functions to generate phenotype data.


Functions to generate standard normal or binary phenotypes based on provided genetic data, for specified effect sizes. The functions generate_phenodata_1_simple and generate_phenodata_1 generate one phenotype Y conditional on single nucleotide variants (SNVs) and two covariates. generate_phenodata_2_bvn as well as generate_phenodata_2_copula generate two phenotypes Y1Y_1, Y2Y_2 with dependence Kendall's tau conditional on the provided SNVs and two covariates.


generate_phenodata_1_simple(genodata = NULL, type = "quantitative",
  b = 0, a = c(0, 0.5, 0.5))

generate_phenodata_1(genodata = NULL, type = "quantitative", b = 0.6,
  a = c(0, 0.5, 0.5), MAF_cutoff = 1, prop_causal = 0.1,
  direction = "a")

generate_phenodata_2_bvn(genodata = NULL, tau = NULL, b1 = 0,
  b2 = 0, a1 = c(0, 0.5, 0.5), a2 = c(0, 0.5, 0.5))

generate_phenodata_2_copula(genodata = NULL, phi = NULL, tau = 0.5,
  b1 = 0.6, b2 = 0.6, a1 = c(0, 0.5, 0.5), a2 = c(0, 0.5, 0.5),
  MAF_cutoff = 1, prop_causal = 0.1, direction = "a")



Numeric input vector or dataframe containing the genetic variant(s) in columns. Must be in allelic coding 0, 1, 2.


String with value "quantitative" or "binary" specifying whether normally-distributed or binary phenotypes are generated.


Integer or vector specifying the genetic effect size(s) of the provided SNVs (genodata) in the data generation.


Numeric vector specifying the effect sizes of the covariates X1X_1, X2X_2 in the data generation.


Integer specifying a minor allele frequency cutoff to determine among which SNVs the causal SNVs are sampled for the phenotype generation.


Integer specifying the desired percentage of causal SNVs among all SNVs.


String with value "a", "b", or "c" specifying whether all causal SNVs have a positive effect on the phenotypes ("a"), 20% of the causal SNVs have a negative effect and 80% a positive effect on the phenotypes ("b"), or 50% of the causal SNVs have a negative effect and 50% a positive effect on the phenotypes ("c").


Integer specifying Kendall's tau, which determines the dependence between the two generated phenotypes.


Integer or vector specifying the genetic effect size(s) of the provided SNVs (genodata) on the first phenotype in the data generation.


Integer or vector specifying the genetic effect size(s) of the provided SNVs (genodata) on the second phenotype in the data generation.


Numeric vector specifying the effect sizes of the covariates X1X_1, X2X_2 on the first phenotype in the data generation.


Numeric vector specifying the effect sizes of the covariates X1X_1, X2X_2 on the second phenotype in the data generation.


Integer specifying the parameter ϕ\phi for the dependence between the two generated phenotypes.


In more detail, the function generate_phenodata_1_simple generates a quantitative or binary phenotype Y with n observations, conditional on the specified SNVs with given effect sizes and conditional on one binary and one standard normally-distributed covariate with specified effect sizes. n is given through the provided SNVs.

generate_phenodata_1 provides an extension of generate_phenodata_1_simple and allows to further select the percentage of causal SNVs, a minor allele frequency cutoff on the causal SNVs, and varying effect directions. n is given through the provided SNVs.

The function generate_phenodata_2_bvn generates two quantitative phenotypes Y1Y_1, Y2Y_2 conditional on one binary and one standard normally-distributed covariate X1X_1, X2X_2 from the bivariate normal distribution so that they have have dependence τ\tau given by Kendall's tau.

The function generate_phenodata_2_copula generates two quantitative phenotypes Y1Y_1, Y2Y_2 conditional on one binary and one standard normally-distributed covariate X1X_1, X2X_2 from the Clayton copula so that Y1Y_1, Y2Y_2 are marginally normally distributed and have dependence Kendall's tau specified by tau or phi, using the function generate_clayton_copula.

The genetic effect sizes are the specified numeric values b and b1, b2, respectively, in the functions generate_phenodata_1_simple and generate_phenodata_2_bvn. In generate_phenodata_1 and generate_phenodata_2_copula, the genetic effect sizes are computed by multiplying b or b1, b2, respectively, with the absolute value of the log10-transformed minor allele frequencies, so that rarer variants have larger effect sizes.


A dataframe containing n observations of the phenotype Y or phenotypes Y1Y_1, Y2Y_2 and of the covariates X1X_1, X2X_2.


# Generate genetic data:
genodata <- generate_genodata(n_SNV = 20, n_ind = 1000)

# Generate different phenotype data:
phenodata1 <- generate_phenodata_1_simple(genodata = genodata[,1],
                                          type = "quantitative", b = 0)
phenodata2 <- generate_phenodata_1_simple(genodata = genodata[,1],
                                          type = "quantitative", b = 2)
phenodata3 <- generate_phenodata_1_simple(genodata = genodata,
                                          type = "quantitative", b = 2)
phenodata4 <- generate_phenodata_1_simple(genodata = genodata,
                                          type = "quantitative",
                                          b = seq(0.1, 2, 0.1))
phenodata5 <- generate_phenodata_1_simple(genodata = genodata[,1],
                                          type = "binary", b = 0)
phenodata6 <- generate_phenodata_1(genodata = genodata[,1],
                                   type = "quantitative", b = 0,
                                   MAF_cutoff = 1, prop_causal = 0.1,
                                   direction = "a")
phenodata7 <- generate_phenodata_1(genodata = genodata,
                                   type = "quantitative", b = 0.6,
                                   MAF_cutoff = 0.1, prop_causal = 0.05,
                                   direction = "a")
phenodata8 <- generate_phenodata_1(genodata = genodata,
                                   type = "quantitative",
                                   b = seq(0.1, 2, 0.1),
                                   MAF_cutoff = 0.1, prop_causal = 0.05,
                                   direction = "a")
phenodata9 <- generate_phenodata_2_bvn(genodata = genodata[,1],
                                       tau = 0.5, b1 = 0, b2 = 0)
phenodata10 <- generate_phenodata_2_bvn(genodata = genodata,
                                        tau = 0.5, b1 = 0, b2 = 0)
phenodata11 <- generate_phenodata_2_bvn(genodata = genodata,
                                        tau = 0.5, b1 = 1,
                                        b2 = seq(0.1,2,0.1))
phenodata12 <- generate_phenodata_2_bvn(genodata = genodata,
                                        tau = 0.5, b1 = 1, b2 = 2)
par(mfrow = c(3, 1))
plot(phenodata12$Y1, phenodata12$Y2)

phenodata13 <- generate_phenodata_2_copula(genodata = genodata[,1],
                                           MAF_cutoff = 1, prop_causal = 1,
                                           tau = 0.5, b1 = 0, b2 = 0)
phenodata14 <- generate_phenodata_2_copula(genodata = genodata,
                                           MAF_cutoff = 1, prop_causal = 0.5,
                                           tau = 0.5, b1 = 0, b2 = 0)
phenodata15 <- generate_phenodata_2_copula(genodata = genodata,
                                           MAF_cutoff = 1, prop_causal = 0.5,
                                           tau = 0.5, b1 = 0, b2 = 0)
phenodata16 <- generate_phenodata_2_copula(genodata = genodata,
                                           MAF_cutoff = 1, prop_causal = 0.5,
                                           tau = 0.2, b1 = 0.3,
                                           b2 = seq(0.1, 2, 0.1))
phenodata17 <- generate_phenodata_2_copula(genodata = genodata,
                                           MAF_cutoff = 1, prop_causal = 0.5,
                                           tau = 0.2, b1 = 0.3, b2 = 0.3)
par(mfrow = c(3, 1))
plot(phenodata17$Y1, phenodata17$Y2)

[Package CJAMP version 0.1.1 Index]