R: Simulate Genotype Data from a Mixture of 3 Bayesian...

simGenoFuncDiffPriors {GWASbyCluster}

R Documentation

Simulate Genotype Data from a Mixture of 3 Bayesian Hierarchical Models

Description

Simulate Genotype Data from a Mixture of 3 Bayesian Hierarchical Models. The minor allele frequency (MAF) of cases has different priors than that of controls.

Usage

simGenoFuncDiffPriors(
    nCases = 100, 
    nControls = 100, 
    nSNPs = 1000, 
    alpha.p.ca = 2, 
    beta.p.ca = 3, 
    alpha.p.co = 2, 
    beta.p.co = 8, 
    pi.p = 0.1, 
    alpha0 = 2, 
    beta0 = 5, 
    pi0 = 0.8, 
    alpha.n.ca = 2, 
    beta.n.ca = 8, 
    alpha.n.co = 2, 
    beta.n.co = 3, 
    pi.n = 0.1, 
    low = 0.02, 
    upp = 0.5, 
    verbose = FALSE)

Arguments

`nCases`	integer. Number of cases.
`nControls`	integer. Number of controls.
`nSNPs`	integer. Number of SNPs.
`alpha.p.ca`	numeric. The first shape parameter of Beta prior in cluster `+` for cases.
`beta.p.ca`	numeric. The second shape parameter of Beta prior in cluster `+` for cases.
`alpha.p.co`	numeric. The first shape parameter of Beta prior in cluster `+` for controls.
`beta.p.co`	numeric. The second shape parameter of Beta prior in cluster `+` for controls.
`pi.p`	numeric. Mixture proportion for cluster `+`.
`alpha0`	numeric. The first shape parameter of Beta prior in cluster `0`.
`beta0`	numeric. The second shape parameter of Beta prior in cluster `0`.
`pi0`	numeric. Mixture proportion for cluster `0`.
`alpha.n.ca`	numeric. The first shape parameter of Beta prior in cluster `-` for cases.
`beta.n.ca`	numeric. The second shape parameter of Beta prior in cluster `-` for cases.
`alpha.n.co`	numeric. The first shape parameter of Beta prior in cluster `-` for controls.
`beta.n.co`	numeric. The second shape parameter of Beta prior in cluster `-` for controls.
`pi.n`	numeric. Mixture proportion for cluster `-`.
`low`	numeric. A small positive value. If a MAF generated from half-flat shape bivariate prior is smaller than `low`, we will delete the SNP to be generated.
`upp`	numeric. A positive value. If a MAF generated from half-flat shape bivariate prior is greater than `upp`, we will delete the SNP to be generated.
`verbose`	logical. Indicating if intermediate results or final results should be output to output screen.

Details

In this simulation, we generate additive-coded genotypes for 3 clusters of SNPs based on a mixture of 3 Bayesian hierarchical models.

In cluster +, the minor allele frequency (MAF) \theta_{x+} of cases is greater than the MAF \theta_{y+} of controls.

In cluster 0, the MAF \theta_{0} of cases is equal to the MAF of controls.

In cluster -, the MAF \theta_{x-} of cases is smaller than the MAF \theta_{y-} of controls.

The proportions of the 3 clusters of SNPs are \pi_{+}, \pi_{0}, and \pi_{-}, respectively.

We assume a “half-flat shape” bivariate prior for the MAF in cluster +

2h_{+}\left(\theta_{x+}\right)h_{+}\left(\theta_{y+}\right) I\left(\theta_{x+}>\theta_{y+}\right),

where I(a) is hte indicator function taking value 1 if the event a is true, and value 0 otherwise. The function h_{+} is the probability density function of the beta distribution Beta\left(\alpha_{+}, \beta_{+}\right).

We assume \theta_{0} has the beta prior Beta(\alpha_0, \beta_0).

We also assume a “half-flat shape” bivariate prior for the MAF in cluster -

2h_{-}\left(\theta_{x-}\right)h_{-}\left(\theta_{y-}\right) I\left(\theta_{x-}>\theta_{y-}\right).

The function h_{-} is the probability density function of the beta distribution Beta\left(\alpha_{-}, \beta_{-}\right).

Given a SNP, we assume Hardy-Weinberg equilibrium holds for its genotypes. That is, given MAF \theta, the probabilities of genotypes are

Pr(geno=2) = \theta^2

Pr(geno=1) = 2\theta\left(1-\theta\right)

Pr(geno=0) = \left(1-\theta\right)^2

We also assume the genotypes 0 (wild-type), 1 (heterozygote), and 2 (mutation) follows a multinomial distribution Multinomial\left\{1, \left[ \theta^2, 2\theta\left(1-\theta\right), \left(1-\theta\right)^2 \right]\right\}

Note that when we generate MAFs from the half-flat shape bivariate priors, we might get very small MAFs or get MAFs >0.5. In these cased, we then delete this SNP.

So the final number of SNPs generated might be less than the initially-set number of SNPs.

Value

An ExpressionSet object stores genotype data.

Author(s)

Yan Xu <yanxu@uvic.ca>, Li Xing <sfulxing@gmail.com>, Jessica Su <rejas@channing.harvard.edu>, Xuekui Zhang <xuekui@uvic.ca>, Weiliang Qiu <Weiliang.Qiu@gmail.com>

References

Yan X, Xing L, Su J, Zhang X, Qiu W. Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies. Scientific Reports 9, Article number: 13686 (2019) https://www.nature.com/articles/s41598-019-50229-6.

Examples

set.seed(2)

esSimDiffPriors = simGenoFuncDiffPriors(
  nCases = 100,
  nControls = 100,
  nSNPs = 500,
  alpha.p.ca = 2, beta.p.ca = 3,
  alpha.p.co = 2, beta.p.co = 8, pi.p = 0.1,
  alpha0 = 2, beta0 = 5, pi0 = 0.8,
  alpha.n.ca = 2, beta.n.ca = 8,
  alpha.n.co = 2, beta.n.co = 3, pi.n = 0.1,
  low = 0.02, upp = 0.5, verbose = FALSE
)

print(esSimDiffPriors)

pDat = pData(esSimDiffPriors)
print(pDat[1:2,])
print(table(pDat$memSubjs))

fDat = fData(esSimDiffPriors)
print(fDat[1:2,])
print(table(fDat$memGenes))
print(table(fDat$memGenes2))

[Package GWASbyCluster version 0.1.7 Index]