esSimDiffPriors {GWASbyCluster}R Documentation

An ExpressionSet Object Storing Simulated Genotype Data

Description

An ExpressionSet object storing simulated genotype data. The minor allele frequency (MAF) of cases has different prior than that of controls.

Usage

data("esSimDiffPriors")

Details

In this simulation, we generate additive-coded genotypes for 3 clusters of SNPs based on a mixture of 3 Bayesian hierarchical models.

In cluster ++, the minor allele frequency (MAF) θx+\theta_{x+} of cases is greater than the MAF θy+\theta_{y+} of controls.

In cluster 00, the MAF θ0\theta_{0} of cases is equal to the MAF of controls.

In cluster -, the MAF θx\theta_{x-} of cases is smaller than the MAF θy\theta_{y-} of controls.

The proportions of the 3 clusters of SNPs are π+\pi_{+}, π0\pi_{0}, and π\pi_{-}, respectively.

We assume a “half-flat shape” bivariate prior for the MAF in cluster ++

2hx+(θx+)hy+(θy+)I(θx+>θy+),2h_{x+}\left(\theta_{x+}\right)h_{y+}\left(\theta_{y+}\right) I\left(\theta_{x+}>\theta_{y+}\right),

where I(a)I(a) is hte indicator function taking value 11 if the event aa is true, and value 00 otherwise. The function hx+h_{x+} is the probability density function of the beta distribution Beta(αx+,βx+)Beta\left(\alpha_{x+}, \beta_{x+}\right). The function hy+h_{y+} is the probability density function of the beta distribution Beta(αy+,βy+)Beta\left(\alpha_{y+}, \beta_{y+}\right).

We assume θ0\theta_{0} has the beta prior Beta(α0,β0)Beta(\alpha_0, \beta_0).

We also assume a “half-flat shape” bivariate prior for the MAF in cluster -

2hx(θx)hy(θy)I(θx>θy).2h_{x-}\left(\theta_{x-}\right)h_{y-}\left(\theta_{y-}\right) I\left(\theta_{x-}>\theta_{y-}\right).

The function hxh_{x-} is the probability density function of the beta distribution Beta(αx,βx)Beta\left(\alpha_{x-}, \beta_{x-}\right). The function hyh_{y-} is the probability density function of the beta distribution Beta(αy,βy)Beta\left(\alpha_{y-}, \beta_{y-}\right).

Given a SNP, we assume Hardy-Weinberg equilibrium holds for its genotypes. That is, given MAF θ\theta, the probabilities of genotypes are

Pr(geno=2)=θ2Pr(geno=2) = \theta^2

Pr(geno=1)=2θ(1θ)Pr(geno=1) = 2\theta\left(1-\theta\right)

Pr(geno=0)=(1θ)2Pr(geno=0) = \left(1-\theta\right)^2

We also assume the genotypes 00 (wild-type), 11 (heterozygote), and 22 (mutation) follows a multinomial distribution Multinomial{1,[θ2,2θ(1θ),(1θ)2]}Multinomial\left\{1, \left[ \theta^2, 2\theta\left(1-\theta\right), \left(1-\theta\right)^2 \right]\right\}

We set the number of cases as 100100, the number of controls as 100100, and the number of SNPs as 10001000.

The hyperparameters are αx+=2\alpha_{x+}=2, βx+=3\beta_{x+}=3, αy+=2\alpha_{y+}=2, βy+=8\beta_{y+}=8, π+=0.1\pi_{+}=0.1,

α0=2\alpha_{0}=2, β0=5\beta_{0}=5, π0=0.8\pi_{0}=0.8,

αx=2\alpha_{x-}=2, βx=8\beta_{x-}=8, αy=2\alpha_{y-}=2, βy=3\beta_{y-}=3, π=0.1\pi_{-}=0.1.

Note that when we generate MAFs from the half-flat shape bivariate priors, we might get very small MAFs or get MAFs >0.5>0.5. In these cased, we then delete this SNP.

So the final number of SNPs generated might be less than the initially-set number 10001000 of SNPs.

For the dataset stored in esSim, there are 838838 SNPs. 6464 SNPs are in cluster -, 708708 SNPs are in cluster 00, and 6666 SNPs are in cluster ++.

References

Yan X, Xing L, Su J, Zhang X, Qiu W. Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies. Scientific Reports 9, Article number: 13686 (2019) https://www.nature.com/articles/s41598-019-50229-6.

Examples

data(esSimDiffPriors)
print(esSimDiffPriors)

pDat=pData(esSimDiffPriors)
print(pDat[1:2,])
print(table(pDat$memSubjs))

fDat=fData(esSimDiffPriors)
print(fDat[1:2,])
print(table(fDat$memGenes))
print(table(fDat$memGenes2))

[Package GWASbyCluster version 0.1.7 Index]