rbm.haplos.thresholds {Ravages} | R Documentation |
Simulation of genetic data based on haplotypes and a libaility model
Description
Simulates genetic data with respect to allele frequency spectrum and linkage disequilibrium pattern observed on given haplotype data under a libaility model
Usage
rbm.haplos.thresholds(haplos, weights = c("SKAT", "constant"),
max.maf.causal = 0.01, p.causal = 0.5, p.protect = 0,
h2, prev, normal.approx = TRUE, size,
replicates, rep.by.causal, verbose = TRUE)
Arguments
haplos |
A matrix of haplotypes with one row per haplotype and one column per variant |
weights |
How to weight rare variants (if "constant", all variants have the same weight, if "SKAT", the rarest variants have the highest weights as in the SKAT paper: weights = -0.4*log10(MAF) ) |
max.maf.causal |
The maf threshold to consider a rare variant (set at 0.01 by default), variants with a MAF upper this threshold will have a weight of 0 |
p.causal |
The proportion of causal variants |
p.protect |
The proportion of protective variants among causal variants |
h2 |
The variance explained by the gene |
prev |
A vector with the prevalence in each group of individuals |
normal.approx |
TRUE/FALSE: whether to use the normal approximation to compute thresholds. Set at TRUE by default |
size |
The sizes of each group of individuals |
replicates |
The number of simulations to perform |
rep.by.causal |
The number of time causal variants will be sampled |
verbose |
Whether to display information about the function actions |
Details
nb.causal
, p.protect
, h2
and prev
should be vectors of length corresponding to the number of groups to simulate. If they are of size 1, values will be duplicated.
All monomorphic variants and variants with a MAF higher than max.maf.causal
will have a weight of 0. Causal variants are sampled among variants having weights greater than 0. Causal variants in each group of individuals are indicated in x@ped$Causal
.
A liability model is built on haplotypes' burden computed on sampled causal variants using each variant's weights
, and adjusted on the desired h2
. Thresholds from this liability are then chosen to respect the given prev
(from a standard normal distribution if normal.approx=TRUE
, or using a distribution from 1e6 sampled burdens if normal.approx=FALSE
). Please be carreful when using the normal approximation with high h2
values or low prev
values.
Haplotypes' probabilities in each group of individuals are then computed and two haplotypes are then sampled for each individual based on these probabilities.
To simulate a group of controls, prev
needs to be set at 1, regardless of the other arguments.
N replicates
will be performed, and to gain in computation time, the same causal variants can be used for multiple replicates as different haplotypes will be sampled for each individual. rep.by.causal
indicates the number of replicates to perform for each set of causal variants.
To ensure a variability in the simulations, we yet recommend to resample causal variants a few times when many replicates are to be performed.
For example, if 1000 replicates are to be performed, we recommend to resample causal variants 20 times.
The phenotype will be stored in @ped$pheno
, and the simulation number is @snps$genomic.region
.
Value
x |
A bed matrix with simulated genotypes |
Examples
#Load LCT dataset for haplotype matrix
data(LCT.haplotypes)
#LCT gene in the EUR population
LCT.gene.hap <- LCT.hap[which(LCT.sample$super.population=="EUR"),
which(LCT.snps$pos>=136545410 & LCT.snps$pos<=136594750)]
#Simulation of 100 controls, and two groups of 50 cases with 30% causal variants
#and with the second group having half h2 and twice the prevalence
#compared to the first one
#5 replicates are performed and causal variants are sampled once
x <- rbm.haplos.thresholds(haplos=LCT.gene.hap, max.maf.causal = 0.01, p.causal=0.3,
p.protect=0, h2=c(0.01, 0.01, 0.02), prev=c(1, 0.01, 0.005),
size=c(100, 50, 50), replicates = 5, rep.by.causal = 5)