generate_phenodata {CJAMP} | R Documentation |
Functions to generate phenotype data.
Description
Functions to generate standard normal or binary phenotypes based on provided genetic
data, for specified effect sizes.
The functions generate_phenodata_1_simple
and
generate_phenodata_1
generate one phenotype Y conditional on
single nucleotide variants (SNVs) and two covariates.
generate_phenodata_2_bvn
as well as generate_phenodata_2_copula
generate two phenotypes Y_1
, Y_2
with dependence Kendall's tau conditional on
the provided SNVs and two covariates.
Usage
generate_phenodata_1_simple(genodata = NULL, type = "quantitative",
b = 0, a = c(0, 0.5, 0.5))
generate_phenodata_1(genodata = NULL, type = "quantitative", b = 0.6,
a = c(0, 0.5, 0.5), MAF_cutoff = 1, prop_causal = 0.1,
direction = "a")
generate_phenodata_2_bvn(genodata = NULL, tau = NULL, b1 = 0,
b2 = 0, a1 = c(0, 0.5, 0.5), a2 = c(0, 0.5, 0.5))
generate_phenodata_2_copula(genodata = NULL, phi = NULL, tau = 0.5,
b1 = 0.6, b2 = 0.6, a1 = c(0, 0.5, 0.5), a2 = c(0, 0.5, 0.5),
MAF_cutoff = 1, prop_causal = 0.1, direction = "a")
Arguments
genodata |
Numeric input vector or dataframe containing the genetic variant(s) in columns. Must be in allelic coding 0, 1, 2. |
type |
String with value |
b |
Integer or vector specifying the genetic effect size(s) of
the provided SNVs ( |
a |
Numeric vector specifying the effect sizes of the covariates |
MAF_cutoff |
Integer specifying a minor allele frequency cutoff to determine among which SNVs the causal SNVs are sampled for the phenotype generation. |
prop_causal |
Integer specifying the desired percentage of causal SNVs among all SNVs. |
direction |
String with value |
tau |
Integer specifying Kendall's tau, which determines the dependence between the two generated phenotypes. |
b1 |
Integer or vector specifying the genetic effect size(s) of
the provided SNVs ( |
b2 |
Integer or vector specifying the genetic effect size(s) of
the provided SNVs ( |
a1 |
Numeric vector specifying the effect sizes of the covariates |
a2 |
Numeric vector specifying the effect sizes of the covariates |
phi |
Integer specifying the parameter |
Details
In more detail, the function generate_phenodata_1_simple
generates a quantitative or binary phenotype Y with n observations,
conditional on the specified SNVs with given effect sizes and conditional
on one binary and one standard normally-distributed covariate with
specified effect sizes. n is given through the provided SNVs.
generate_phenodata_1
provides an extension of
generate_phenodata_1_simple
and allows to further select
the percentage of causal SNVs, a minor allele frequency cutoff on the
causal SNVs, and varying effect directions. n is given through the
provided SNVs.
The function generate_phenodata_2_bvn
generates
two quantitative phenotypes Y_1
, Y_2
conditional on one binary and one
standard normally-distributed covariate X_1
, X_2
from the bivariate
normal distribution so that they have have dependence \tau
given
by Kendall's tau
.
The function generate_phenodata_2_copula
generates
two quantitative phenotypes Y_1
, Y_2
conditional on one binary and one
standard normally-distributed covariate X_1
, X_2
from the Clayton copula
so that Y_1
, Y_2
are marginally normally distributed and have dependence
Kendall's tau specified by tau
or phi
, using the function
generate_clayton_copula
.
The genetic effect sizes are the specified numeric values b
and
b1, b2
, respectively, in the functions generate_phenodata_1_simple
and generate_phenodata_2_bvn
. In
generate_phenodata_1
and generate_phenodata_2_copula
,
the genetic effect sizes are computed by multiplying b
or b1, b2
,
respectively, with the absolute value of the log10-transformed
minor allele frequencies, so that rarer variants have larger effect sizes.
Value
A dataframe containing n observations of the phenotype Y or phenotypes
Y_1
, Y_2
and of the covariates X_1
, X_2
.
Examples
# Generate genetic data:
set.seed(10)
genodata <- generate_genodata(n_SNV = 20, n_ind = 1000)
compute_MAF(genodata)
# Generate different phenotype data:
phenodata1 <- generate_phenodata_1_simple(genodata = genodata[,1],
type = "quantitative", b = 0)
phenodata2 <- generate_phenodata_1_simple(genodata = genodata[,1],
type = "quantitative", b = 2)
phenodata3 <- generate_phenodata_1_simple(genodata = genodata,
type = "quantitative", b = 2)
phenodata4 <- generate_phenodata_1_simple(genodata = genodata,
type = "quantitative",
b = seq(0.1, 2, 0.1))
phenodata5 <- generate_phenodata_1_simple(genodata = genodata[,1],
type = "binary", b = 0)
phenodata6 <- generate_phenodata_1(genodata = genodata[,1],
type = "quantitative", b = 0,
MAF_cutoff = 1, prop_causal = 0.1,
direction = "a")
phenodata7 <- generate_phenodata_1(genodata = genodata,
type = "quantitative", b = 0.6,
MAF_cutoff = 0.1, prop_causal = 0.05,
direction = "a")
phenodata8 <- generate_phenodata_1(genodata = genodata,
type = "quantitative",
b = seq(0.1, 2, 0.1),
MAF_cutoff = 0.1, prop_causal = 0.05,
direction = "a")
phenodata9 <- generate_phenodata_2_bvn(genodata = genodata[,1],
tau = 0.5, b1 = 0, b2 = 0)
phenodata10 <- generate_phenodata_2_bvn(genodata = genodata,
tau = 0.5, b1 = 0, b2 = 0)
phenodata11 <- generate_phenodata_2_bvn(genodata = genodata,
tau = 0.5, b1 = 1,
b2 = seq(0.1,2,0.1))
phenodata12 <- generate_phenodata_2_bvn(genodata = genodata,
tau = 0.5, b1 = 1, b2 = 2)
par(mfrow = c(3, 1))
hist(phenodata12$Y1)
hist(phenodata12$Y2)
plot(phenodata12$Y1, phenodata12$Y2)
phenodata13 <- generate_phenodata_2_copula(genodata = genodata[,1],
MAF_cutoff = 1, prop_causal = 1,
tau = 0.5, b1 = 0, b2 = 0)
phenodata14 <- generate_phenodata_2_copula(genodata = genodata,
MAF_cutoff = 1, prop_causal = 0.5,
tau = 0.5, b1 = 0, b2 = 0)
phenodata15 <- generate_phenodata_2_copula(genodata = genodata,
MAF_cutoff = 1, prop_causal = 0.5,
tau = 0.5, b1 = 0, b2 = 0)
phenodata16 <- generate_phenodata_2_copula(genodata = genodata,
MAF_cutoff = 1, prop_causal = 0.5,
tau = 0.2, b1 = 0.3,
b2 = seq(0.1, 2, 0.1))
phenodata17 <- generate_phenodata_2_copula(genodata = genodata,
MAF_cutoff = 1, prop_causal = 0.5,
tau = 0.2, b1 = 0.3, b2 = 0.3)
par(mfrow = c(3, 1))
hist(phenodata17$Y1)
hist(phenodata17$Y2)
plot(phenodata17$Y1, phenodata17$Y2)