simGenoFuncDiffPriors {GWASbyCluster}R Documentation

Simulate Genotype Data from a Mixture of 3 Bayesian Hierarchical Models

Description

Simulate Genotype Data from a Mixture of 3 Bayesian Hierarchical Models. The minor allele frequency (MAF) of cases has different priors than that of controls.

Usage

simGenoFuncDiffPriors(
    nCases = 100, 
    nControls = 100, 
    nSNPs = 1000, 
    alpha.p.ca = 2, 
    beta.p.ca = 3, 
    alpha.p.co = 2, 
    beta.p.co = 8, 
    pi.p = 0.1, 
    alpha0 = 2, 
    beta0 = 5, 
    pi0 = 0.8, 
    alpha.n.ca = 2, 
    beta.n.ca = 8, 
    alpha.n.co = 2, 
    beta.n.co = 3, 
    pi.n = 0.1, 
    low = 0.02, 
    upp = 0.5, 
    verbose = FALSE)

Arguments

nCases

integer. Number of cases.

nControls

integer. Number of controls.

nSNPs

integer. Number of SNPs.

alpha.p.ca

numeric. The first shape parameter of Beta prior in cluster ++ for cases.

beta.p.ca

numeric. The second shape parameter of Beta prior in cluster ++ for cases.

alpha.p.co

numeric. The first shape parameter of Beta prior in cluster ++ for controls.

beta.p.co

numeric. The second shape parameter of Beta prior in cluster ++ for controls.

pi.p

numeric. Mixture proportion for cluster ++.

alpha0

numeric. The first shape parameter of Beta prior in cluster 00.

beta0

numeric. The second shape parameter of Beta prior in cluster 00.

pi0

numeric. Mixture proportion for cluster 00.

alpha.n.ca

numeric. The first shape parameter of Beta prior in cluster - for cases.

beta.n.ca

numeric. The second shape parameter of Beta prior in cluster - for cases.

alpha.n.co

numeric. The first shape parameter of Beta prior in cluster - for controls.

beta.n.co

numeric. The second shape parameter of Beta prior in cluster - for controls.

pi.n

numeric. Mixture proportion for cluster -.

low

numeric. A small positive value. If a MAF generated from half-flat shape bivariate prior is smaller than low, we will delete the SNP to be generated.

upp

numeric. A positive value. If a MAF generated from half-flat shape bivariate prior is greater than upp, we will delete the SNP to be generated.

verbose

logical. Indicating if intermediate results or final results should be output to output screen.

Details

In this simulation, we generate additive-coded genotypes for 3 clusters of SNPs based on a mixture of 3 Bayesian hierarchical models.

In cluster ++, the minor allele frequency (MAF) θx+\theta_{x+} of cases is greater than the MAF θy+\theta_{y+} of controls.

In cluster 00, the MAF θ0\theta_{0} of cases is equal to the MAF of controls.

In cluster -, the MAF θx\theta_{x-} of cases is smaller than the MAF θy\theta_{y-} of controls.

The proportions of the 3 clusters of SNPs are π+\pi_{+}, π0\pi_{0}, and π\pi_{-}, respectively.

We assume a “half-flat shape” bivariate prior for the MAF in cluster ++

2h+(θx+)h+(θy+)I(θx+>θy+),2h_{+}\left(\theta_{x+}\right)h_{+}\left(\theta_{y+}\right) I\left(\theta_{x+}>\theta_{y+}\right),

where I(a)I(a) is hte indicator function taking value 11 if the event aa is true, and value 00 otherwise. The function h+h_{+} is the probability density function of the beta distribution Beta(α+,β+)Beta\left(\alpha_{+}, \beta_{+}\right).

We assume θ0\theta_{0} has the beta prior Beta(α0,β0)Beta(\alpha_0, \beta_0).

We also assume a “half-flat shape” bivariate prior for the MAF in cluster -

2h(θx)h(θy)I(θx>θy).2h_{-}\left(\theta_{x-}\right)h_{-}\left(\theta_{y-}\right) I\left(\theta_{x-}>\theta_{y-}\right).

The function hh_{-} is the probability density function of the beta distribution Beta(α,β)Beta\left(\alpha_{-}, \beta_{-}\right).

Given a SNP, we assume Hardy-Weinberg equilibrium holds for its genotypes. That is, given MAF θ\theta, the probabilities of genotypes are

Pr(geno=2)=θ2Pr(geno=2) = \theta^2

Pr(geno=1)=2θ(1θ)Pr(geno=1) = 2\theta\left(1-\theta\right)

Pr(geno=0)=(1θ)2Pr(geno=0) = \left(1-\theta\right)^2

We also assume the genotypes 00 (wild-type), 11 (heterozygote), and 22 (mutation) follows a multinomial distribution Multinomial{1,[θ2,2θ(1θ),(1θ)2]}Multinomial\left\{1, \left[ \theta^2, 2\theta\left(1-\theta\right), \left(1-\theta\right)^2 \right]\right\}

Note that when we generate MAFs from the half-flat shape bivariate priors, we might get very small MAFs or get MAFs >0.5>0.5. In these cased, we then delete this SNP.

So the final number of SNPs generated might be less than the initially-set number of SNPs.

Value

An ExpressionSet object stores genotype data.

Author(s)

Yan Xu <yanxu@uvic.ca>, Li Xing <sfulxing@gmail.com>, Jessica Su <rejas@channing.harvard.edu>, Xuekui Zhang <xuekui@uvic.ca>, Weiliang Qiu <Weiliang.Qiu@gmail.com>

References

Yan X, Xing L, Su J, Zhang X, Qiu W. Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies. Scientific Reports 9, Article number: 13686 (2019) https://www.nature.com/articles/s41598-019-50229-6.

Examples

set.seed(2)

esSimDiffPriors = simGenoFuncDiffPriors(
  nCases = 100,
  nControls = 100,
  nSNPs = 500,
  alpha.p.ca = 2, beta.p.ca = 3,
  alpha.p.co = 2, beta.p.co = 8, pi.p = 0.1,
  alpha0 = 2, beta0 = 5, pi0 = 0.8,
  alpha.n.ca = 2, beta.n.ca = 8,
  alpha.n.co = 2, beta.n.co = 3, pi.n = 0.1,
  low = 0.02, upp = 0.5, verbose = FALSE
)

print(esSimDiffPriors)

pDat = pData(esSimDiffPriors)
print(pDat[1:2,])
print(table(pDat$memSubjs))

fDat = fData(esSimDiffPriors)
print(fDat[1:2,])
print(table(fDat$memGenes))
print(table(fDat$memGenes2))

[Package GWASbyCluster version 0.1.7 Index]