simGenoFunc {GWASbyCluster} | R Documentation |
Simulate Genotype Data from a Mixture of 3 Bayesian Hierarchical Models
Description
Simulate Genotype Data from a Mixture of 3 Bayesian Hierarchical Models. The minor allele frequency (MAF) of cases has the same prior as that of controls.
Usage
simGenoFunc(nCases = 100,
nControls = 100,
nSNPs = 1000,
alpha.p = 2,
beta.p = 5,
pi.p = 0.1,
alpha0 = 2,
beta0 = 5,
pi0 = 0.8,
alpha.n = 2,
beta.n = 5,
pi.n = 0.1,
low = 0.02,
upp = 0.5,
verbose = FALSE)
Arguments
nCases |
integer. Number of cases. |
nControls |
integer. Number of controls. |
nSNPs |
integer. Number of SNPs. |
alpha.p |
numeric. The first shape parameter of Beta prior in cluster |
beta.p |
numeric. The second shape parameter of Beta prior in cluster |
pi.p |
numeric. Mixture proportion for cluster |
alpha0 |
numeric. The first shape parameter of Beta prior in cluster |
beta0 |
numeric. The second shape parameter of Beta prior in cluster |
pi0 |
numeric. Mixture proportion for cluster |
alpha.n |
numeric. The first shape parameter of Beta prior in cluster |
beta.n |
numeric. The second shape parameter of Beta prior in cluster |
pi.n |
numeric. Mixture proportion for cluster |
low |
numeric. A small positive value. If a MAF generated from half-flat shape
bivariate prior is smaller than |
upp |
numeric. A positive value. If a MAF generated from half-flat shape
bivariate prior is greater than |
verbose |
logical. Indicating if intermediate results or final results should be output to output screen. |
Details
In this simulation, we generate additive-coded genotypes for 3 clusters of SNPs based on a mixture of 3 Bayesian hierarchical models.
In cluster +
, the minor allele frequency
(MAF) \theta_{x+}
of cases is greater than the MAF \theta_{y+}
of
controls.
In cluster 0
, the MAF \theta_{0}
of cases is equal to
the MAF of controls.
In cluster -
, the MAF \theta_{x-}
of cases is smaller than
the MAF \theta_{y-}
of controls.
The proportions of the 3 clusters of SNPs are \pi_{+}
,
\pi_{0}
, and \pi_{-}
, respectively.
We assume a “half-flat shape” bivariate prior for the MAF in
cluster +
2h_{+}\left(\theta_{x+}\right)h_{+}\left(\theta_{y+}\right)
I\left(\theta_{x+}>\theta_{y+}\right),
where I(a)
is hte indicator function taking value 1
if the event
a
is true, and value 0
otherwise.
The function h_{+}
is the probability density function of the
beta distribution Beta\left(\alpha_{+}, \beta_{+}\right)
.
We assume \theta_{0}
has the beta prior Beta(\alpha_0, \beta_0)
.
We also assume a “half-flat shape” bivariate prior for the MAF in
cluster -
2h_{-}\left(\theta_{x-}\right)h_{-}\left(\theta_{y-}\right)
I\left(\theta_{x-}>\theta_{y-}\right).
The function h_{-}
is the probability density function of the
beta distribution Beta\left(\alpha_{-}, \beta_{-}\right)
.
Given a SNP, we assume Hardy-Weinberg equilibrium holds for its genotypes.
That is, given MAF \theta
, the probabilities of genotypes are
Pr(geno=2) = \theta^2
Pr(geno=1) = 2\theta\left(1-\theta\right)
Pr(geno=0) = \left(1-\theta\right)^2
We also assume the genotypes 0
(wild-type), 1
(heterozygote), and
2
(mutation) follows a multinomial distribution
Multinomial\left\{1, \left[
\theta^2, 2\theta\left(1-\theta\right),
\left(1-\theta\right)^2
\right]\right\}
Note that when we generate MAFs from the half-flat shape bivariate priors,
we might get very small MAFs or get MAFs >0.5
. In these cased,
we then delete this SNP.
So the final number of SNPs generated might be less than the initially-set number of SNPs.
Value
An ExpressionSet object stores genotype data.
Author(s)
Yan Xu <yanxu@uvic.ca>, Li Xing <sfulxing@gmail.com>, Jessica Su <rejas@channing.harvard.edu>, Xuekui Zhang <xuekui@uvic.ca>, Weiliang Qiu <Weiliang.Qiu@gmail.com>
References
Yan X, Xing L, Su J, Zhang X, Qiu W. Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies. Scientific Reports 9, Article number: 13686 (2019) https://www.nature.com/articles/s41598-019-50229-6.
Examples
set.seed(2)
esSim = simGenoFunc(
nCases = 100,
nControls = 100,
nSNPs = 500,
alpha.p = 2, beta.p = 5, pi.p = 0.1,
alpha0 = 2, beta0 = 5, pi0 = 0.8,
alpha.n = 2, beta.n = 5, pi.n = 0.1,
low = 0.02, upp = 0.5, verbose = FALSE
)
print(esSim)
pDat = pData(esSim)
print(pDat[1:2,])
print(table(pDat$memSubjs))
fDat = fData(esSim)
print(fDat[1:2,])
print(table(fDat$memGenes))
print(table(fDat$memGenes2))