rbm.haplos.thresholds {Ravages}R Documentation

Simulation of genetic data based on haplotypes and a libaility model

Description

Simulates genetic data with respect to allele frequency spectrum and linkage disequilibrium pattern observed on given haplotype data under a libaility model

Usage

  rbm.haplos.thresholds(haplos, weights = c("SKAT", "constant"), 
                        max.maf.causal = 0.01, p.causal = 0.5, p.protect = 0, 
                        h2, prev, normal.approx = TRUE, size, 
                        replicates, rep.by.causal, verbose = TRUE)

Arguments

haplos

A matrix of haplotypes with one row per haplotype and one column per variant

weights

How to weight rare variants (if "constant", all variants have the same weight, if "SKAT", the rarest variants have the highest weights as in the SKAT paper: weights = -0.4*log10(MAF) )

max.maf.causal

The maf threshold to consider a rare variant (set at 0.01 by default), variants with a MAF upper this threshold will have a weight of 0

p.causal

The proportion of causal variants

p.protect

The proportion of protective variants among causal variants

h2

The variance explained by the gene

prev

A vector with the prevalence in each group of individuals

normal.approx

TRUE/FALSE: whether to use the normal approximation to compute thresholds. Set at TRUE by default

size

The sizes of each group of individuals

replicates

The number of simulations to perform

rep.by.causal

The number of time causal variants will be sampled

verbose

Whether to display information about the function actions

Details

nb.causal, p.protect, h2 and prev should be vectors of length corresponding to the number of groups to simulate. If they are of size 1, values will be duplicated.

All monomorphic variants and variants with a MAF higher than max.maf.causal will have a weight of 0. Causal variants are sampled among variants having weights greater than 0. Causal variants in each group of individuals are indicated in x@ped$Causal.

A liability model is built on haplotypes' burden computed on sampled causal variants using each variant's weights, and adjusted on the desired h2. Thresholds from this liability are then chosen to respect the given prev (from a standard normal distribution if normal.approx=TRUE, or using a distribution from 1e6 sampled burdens if normal.approx=FALSE). Please be carreful when using the normal approximation with high h2 values or low prev values. Haplotypes' probabilities in each group of individuals are then computed and two haplotypes are then sampled for each individual based on these probabilities.

To simulate a group of controls, prev needs to be set at 1, regardless of the other arguments.

N replicates will be performed, and to gain in computation time, the same causal variants can be used for multiple replicates as different haplotypes will be sampled for each individual. rep.by.causal indicates the number of replicates to perform for each set of causal variants. To ensure a variability in the simulations, we yet recommend to resample causal variants a few times when many replicates are to be performed. For example, if 1000 replicates are to be performed, we recommend to resample causal variants 20 times.

The phenotype will be stored in @ped$pheno, and the simulation number is @snps$genomic.region.

Value

x

A bed matrix with simulated genotypes

Examples


  #Load LCT dataset for haplotype matrix
  data(LCT.haplotypes)
  #LCT gene in the EUR population
  LCT.gene.hap <- LCT.hap[which(LCT.sample$super.population=="EUR"), 
                          which(LCT.snps$pos>=136545410 & LCT.snps$pos<=136594750)]

  #Simulation of 100 controls, and two groups of 50 cases with 30% causal variants
  #and with the second group having half h2 and twice the prevalence
  #compared to the first one
  #5 replicates are performed and causal variants are sampled once
  x <- rbm.haplos.thresholds(haplos=LCT.gene.hap, max.maf.causal = 0.01, p.causal=0.3,
                             p.protect=0, h2=c(0.01, 0.01, 0.02), prev=c(1, 0.01, 0.005),
                             size=c(100, 50, 50), replicates = 5, rep.by.causal = 5)


[Package Ravages version 1.1.3 Index]