esSimDiffPriors {GWASbyCluster} | R Documentation |
An ExpressionSet Object Storing Simulated Genotype Data
Description
An ExpressionSet object storing simulated genotype data. The minor allele frequency (MAF) of cases has different prior than that of controls.
Usage
data("esSimDiffPriors")
Details
In this simulation, we generate additive-coded genotypes for 3 clusters of SNPs based on a mixture of 3 Bayesian hierarchical models.
In cluster +
, the minor allele frequency
(MAF) \theta_{x+}
of cases is greater than the MAF \theta_{y+}
of
controls.
In cluster 0
, the MAF \theta_{0}
of cases is equal to
the MAF of controls.
In cluster -
, the MAF \theta_{x-}
of cases is smaller than
the MAF \theta_{y-}
of controls.
The proportions of the 3 clusters of SNPs are \pi_{+}
,
\pi_{0}
, and \pi_{-}
, respectively.
We assume a “half-flat shape” bivariate prior for the MAF in
cluster +
2h_{x+}\left(\theta_{x+}\right)h_{y+}\left(\theta_{y+}\right)
I\left(\theta_{x+}>\theta_{y+}\right),
where I(a)
is hte indicator function taking value 1
if the event
a
is true, and value 0
otherwise.
The function h_{x+}
is the probability density function of the
beta distribution Beta\left(\alpha_{x+}, \beta_{x+}\right)
.
The function h_{y+}
is the probability density function of the
beta distribution Beta\left(\alpha_{y+}, \beta_{y+}\right)
.
We assume \theta_{0}
has the beta prior Beta(\alpha_0, \beta_0)
.
We also assume a “half-flat shape” bivariate prior for the MAF in
cluster -
2h_{x-}\left(\theta_{x-}\right)h_{y-}\left(\theta_{y-}\right)
I\left(\theta_{x-}>\theta_{y-}\right).
The function h_{x-}
is the probability density function of the
beta distribution Beta\left(\alpha_{x-}, \beta_{x-}\right)
.
The function h_{y-}
is the probability density function of the
beta distribution Beta\left(\alpha_{y-}, \beta_{y-}\right)
.
Given a SNP, we assume Hardy-Weinberg equilibrium holds for its genotypes.
That is, given MAF \theta
, the probabilities of genotypes are
Pr(geno=2) = \theta^2
Pr(geno=1) = 2\theta\left(1-\theta\right)
Pr(geno=0) = \left(1-\theta\right)^2
We also assume the genotypes 0
(wild-type), 1
(heterozygote), and
2
(mutation) follows a multinomial distribution
Multinomial\left\{1, \left[
\theta^2, 2\theta\left(1-\theta\right),
\left(1-\theta\right)^2
\right]\right\}
We set the number of cases as 100
, the number of controls as
100
,
and the number of SNPs as 1000
.
The hyperparameters are
\alpha_{x+}=2
, \beta_{x+}=3
,
\alpha_{y+}=2
, \beta_{y+}=8
,
\pi_{+}=0.1
,
\alpha_{0}=2
, \beta_{0}=5
, \pi_{0}=0.8
,
\alpha_{x-}=2
, \beta_{x-}=8
,
\alpha_{y-}=2
, \beta_{y-}=3
,
\pi_{-}=0.1
.
Note that when we generate MAFs from the half-flat shape bivariate priors,
we might get very small MAFs or get MAFs >0.5
. In these cased,
we then delete this SNP.
So the final number of SNPs generated might be less than the initially-set
number 1000
of SNPs.
For the dataset stored in esSim
, there are 838
SNPs.
64
SNPs are in cluster -, 708
SNPs are in cluster 0
,
and 66
SNPs are in cluster +
.
References
Yan X, Xing L, Su J, Zhang X, Qiu W. Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies. Scientific Reports 9, Article number: 13686 (2019) https://www.nature.com/articles/s41598-019-50229-6.
Examples
data(esSimDiffPriors)
print(esSimDiffPriors)
pDat=pData(esSimDiffPriors)
print(pDat[1:2,])
print(table(pDat$memSubjs))
fDat=fData(esSimDiffPriors)
print(fDat[1:2,])
print(table(fDat$memGenes))
print(table(fDat$memGenes2))