simGenoFunc {GWASbyCluster} | R Documentation |
Simulate Genotype Data from a Mixture of 3 Bayesian Hierarchical Models
Description
Simulate Genotype Data from a Mixture of 3 Bayesian Hierarchical Models. The minor allele frequency (MAF) of cases has the same prior as that of controls.
Usage
simGenoFunc(nCases = 100,
nControls = 100,
nSNPs = 1000,
alpha.p = 2,
beta.p = 5,
pi.p = 0.1,
alpha0 = 2,
beta0 = 5,
pi0 = 0.8,
alpha.n = 2,
beta.n = 5,
pi.n = 0.1,
low = 0.02,
upp = 0.5,
verbose = FALSE)
Arguments
nCases |
integer. Number of cases. |
nControls |
integer. Number of controls. |
nSNPs |
integer. Number of SNPs. |
alpha.p |
numeric. The first shape parameter of Beta prior in cluster |
beta.p |
numeric. The second shape parameter of Beta prior in cluster |
pi.p |
numeric. Mixture proportion for cluster |
alpha0 |
numeric. The first shape parameter of Beta prior in cluster |
beta0 |
numeric. The second shape parameter of Beta prior in cluster |
pi0 |
numeric. Mixture proportion for cluster |
alpha.n |
numeric. The first shape parameter of Beta prior in cluster |
beta.n |
numeric. The second shape parameter of Beta prior in cluster |
pi.n |
numeric. Mixture proportion for cluster |
low |
numeric. A small positive value. If a MAF generated from half-flat shape
bivariate prior is smaller than |
upp |
numeric. A positive value. If a MAF generated from half-flat shape
bivariate prior is greater than |
verbose |
logical. Indicating if intermediate results or final results should be output to output screen. |
Details
In this simulation, we generate additive-coded genotypes for 3 clusters of SNPs based on a mixture of 3 Bayesian hierarchical models.
In cluster , the minor allele frequency
(MAF)
of cases is greater than the MAF
of
controls.
In cluster , the MAF
of cases is equal to
the MAF of controls.
In cluster , the MAF
of cases is smaller than
the MAF
of controls.
The proportions of the 3 clusters of SNPs are ,
, and
, respectively.
We assume a “half-flat shape” bivariate prior for the MAF in
cluster
where is hte indicator function taking value
if the event
is true, and value
otherwise.
The function
is the probability density function of the
beta distribution
.
We assume has the beta prior
.
We also assume a “half-flat shape” bivariate prior for the MAF in
cluster
The function is the probability density function of the
beta distribution
.
Given a SNP, we assume Hardy-Weinberg equilibrium holds for its genotypes.
That is, given MAF , the probabilities of genotypes are
We also assume the genotypes (wild-type),
(heterozygote), and
(mutation) follows a multinomial distribution
Note that when we generate MAFs from the half-flat shape bivariate priors,
we might get very small MAFs or get MAFs . In these cased,
we then delete this SNP.
So the final number of SNPs generated might be less than the initially-set number of SNPs.
Value
An ExpressionSet object stores genotype data.
Author(s)
Yan Xu <yanxu@uvic.ca>, Li Xing <sfulxing@gmail.com>, Jessica Su <rejas@channing.harvard.edu>, Xuekui Zhang <xuekui@uvic.ca>, Weiliang Qiu <Weiliang.Qiu@gmail.com>
References
Yan X, Xing L, Su J, Zhang X, Qiu W. Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies. Scientific Reports 9, Article number: 13686 (2019) https://www.nature.com/articles/s41598-019-50229-6.
Examples
set.seed(2)
esSim = simGenoFunc(
nCases = 100,
nControls = 100,
nSNPs = 500,
alpha.p = 2, beta.p = 5, pi.p = 0.1,
alpha0 = 2, beta0 = 5, pi0 = 0.8,
alpha.n = 2, beta.n = 5, pi.n = 0.1,
low = 0.02, upp = 0.5, verbose = FALSE
)
print(esSim)
pDat = pData(esSim)
print(pDat[1:2,])
print(table(pDat$memSubjs))
fDat = fData(esSim)
print(fDat[1:2,])
print(table(fDat$memGenes))
print(table(fDat$memGenes2))