R: Simulate metabolic labeling - nucleotide conversion RNA-seq...

SimulateReadsForSample {grandR}

R Documentation

Simulate metabolic labeling - nucleotide conversion RNA-seq data.

Description

This function takes a vector of true relative abundances and NTRs, and then simulates (i) read counts per gene and (ii) 4sU incorporation and conversion events. Subsequently, it uses the same approach as implemented in the GRAND-SLAM 2.0 software (Juerges et al., Bioinformatics 2018) to estimate the NTR from these simulated data.

Usage

SimulateReadsForSample(
  num.reads = 2e+07,
  rel.abundance = setNames(rlnorm(10000, meanlog = 4.5, sdlog = 1), paste0("Gene",
    1:10000)),
  ntr = setNames(rbeta(10000, 1.5, 3), paste0("Gene", 1:10000)),
  dispersion = 0.05,
  beta.approx = FALSE,
  conversion.reads = FALSE,
  u.content = 0.25,
  u.content.sd = 0.05,
  read.length = 75,
  p.old = 1e-04,
  p.new = 0.04,
  p.new.fit = p.new,
  seed = NULL
)

Arguments

`num.reads`	the total amount of reads for simulation
`rel.abundance`	named (according to genes) vector of the true relative abundances. Is divided by its sum.
`ntr`	vector of true NTRs
`dispersion`	vector of dispersion parameters (should best be estimated by DESeq2)
`beta.approx`	should the beta approximation of the NTR posterior be computed?
`conversion.reads`	also output the number of reads with conversion
`u.content`	the relative frequency of uridines in the reads
`u.content.sd`	the standard deviation of the u content
`read.length`	the read length for simulation
`p.old`	the probability for a conversion in reads originating from old RNA
`p.new`	the probability for a conversion in reads originating from new RNA
`p.new.fit`	the probability for a conversion in reads originating from new RNA that is used for fitting (to simulate bias in the estimation of p.new)
`seed`	seed value for the random number generator (set to make it deterministic!)

Details

The simulation proceeds as follows:

Draw for each gene the number of reads from a negative binomial distribution parametrized with the relative abundances x read number and the dispersion parameter
For each gene: Draw for each read the number of uridines according to a beta binomial distribution for the given read length (the beta prior is parametrized to match the u.content and u.content.sd parameters)
For each read: Draw the number of conversions according to the binomial mixture model of GRAND-SLAM (parametrized with p_old, p_new, the gene specific NTR and the read specific number of uridines)
Estimate the NTR by using the GRAND-SLAM approach

Value

a matrix containing, per column, the simulated counts, the simulated NTRs, (potentially the shape parameters of the beta distribution approximation,) and the true relative frequencies and ntrs

Examples

SimulateReadsForSample(num.reads = 10000,rel.abundance = rep(1,5),ntr=0.9)
SimulateReadsForSample(num.reads = 10000,rel.abundance = rep(1,5),ntr=0.9,seed=1337)
SimulateReadsForSample(num.reads = 10000,rel.abundance = rep(1,5),ntr=0.9,seed=1337)
# the second and third matrix should be equal, the first should be distinct

[Package grandR version 0.2.5 Index]