simulateigaseq {IgAScores}R Documentation

Simulate an IgA-Seq dataset from a pre-defined set of IgA-binding probabilities

Description

Simulates IgA-Seq to create datasets with a defined binding distribution that can be used to test scoring method performance

Usage

simulateigaseq(
  igavalmeans = NULL,
  igavalsds = NULL,
  nosamples = 10,
  samplingdepth = 1e+05,
  posthresh = 4,
  negthresh = 2,
  seed = 66,
  betweengroups = FALSE,
  betweenper = 10,
  betweensp = NULL
)

Arguments

igavalmeans

A vector of mean IgA values for as many species as you wish to simulate. Will default to an exponentially distributed vector of 10 species.

igavalsds

A vector of standard deviations that will be used to generate IgA value distributions alongside the means. Defaults to 1 for all values.

nosamples

The number of samples to generate simulated data from. Defaults to 10.

samplingdepth

The number of bacteria to simulate in each sample. Defaults to 100000.

posthresh

The IgA value threshold above which a bacteria will be considered IgA positive. Defaults to 4 (which is reasonable with the other defaults). It is recommended to run a simulation twice to determine reasonable thresholds on the first go.

negthresh

The IgA value threshold below which a bacteria will be considered IgA negative. Defaults to 2 (which is reasonable with the other defaults). It is recommended to run a simulation twice to determine reasonable thresholds on the first go.

seed

Seed for random number generation. Has a default so must be changed to rerun simulations.

betweengroups

If TRUE this will modify starting abundances of half of the samples similarly (by adding betweenper% of total counts to a single species) to simulate the case where there is an abundance shift without a change in IgA binding affinity. Defaults to FALSE.

betweenper

Percentage of total counts to add to a species in the second group in the betweengroups mode.

betweensp

Species (by index) to increased in between groups simulation. Chosen at random if NULL (default).

Details

This function will generate a simulated immunoglobulin A sequencing (IgA-Seq) data set starting from a list containing the mean (and standard deviations) of IgA binding values expected for each species and cut-offs for defining the IgA positive and negative gates. The input is a vector giving the average IgA value of each species (any arbitrary value that will represent the relative level of IgA binding between the species, ensure standard deviation and cut-offs are in the same magnitude). These values are treated as the means of a normal distribution of IgA binding values for each species. Species counts are generated on a log distribution for a given number of samples at an even depth. For each bacteria in each sample, an IgA binding value is then assigned by sampling from its species IgA value distribution. The value thresholds defining the positive and negative gates are then used to generate positive and negative counts tables of the bacteria whose values fall into these groups. A second mode can also be used (by toggling betweengroups) that will introduce a consistent abundance change in half the samples by increasing one species in them. This can be used to simulate case-control experiments where, as an example, one taxa has bloomed. Further details can be found in Jackson et al. (2020, doi: 10.1101/2020.08.19.257501).

Note: IgA values are simulated for each bacteria in each sample, setting the combination of the samplingdepth, number of species, and number of samples too high will slow the data generation.

Value

A list containing the simulated data set and relevant input parameters.

Examples

dat <- simulateigaseq(c(0.1,1,10,15),rep(1,4),posthresh=8,negthresh=4,samplingdepth=100)

[Package IgAScores version 0.1.2 Index]