Dirichlet.multinomial {HMP} | R Documentation |
Generation of Dirichlet-Multinomial Random Samples
Description
Random generation of Dirichlet-Multinomial samples.
Usage
Dirichlet.multinomial(Nrs, shape)
Arguments
Nrs |
A vector specifying the number of reads or sequence depth for each sample. |
shape |
A vector of Dirichlet parameters for each taxa. |
Details
The Dirichlet-Multinomial distribution is given by (Mosimann, J. E. (1962); Tvedebrink, T. (2010)),
\textbf{P}\left ({\textbf{X}_i}=x_{i};\left \{ \pi_j \right \},\theta\right )=\frac{N_{i}!}{x_{i1} !,\ldots,x_{iK} !}\frac{\prod_{j=1}^K \prod_{r=1}^{x_{ij}} \left \{ \pi_j \left ( 1-\theta \right )+\left ( r-1 \right )\theta\right \}}{\prod_{r=1}^{N_i}\left ( 1-\theta\right )+\left ( r-1 \right) \theta}
where \textbf{x}_{i}= \left [ x_{i1}, \ldots, x_{iK} \right ]
is the random vector formed by K taxa (features) counts (RAD vector), N_{i}= \sum_{j=1}^K x_{ij}
is the total number of reads (sequence depth), \left\{ \pi_j \right\}
are the mean of taxa-proportions (RAD-probability mean), and \theta
is the overdispersion parameter.
Note: Though the test statistic supports an unequal number of reads across samples, the performance has not yet been fully tested.
Value
A data matrix of taxa counts where the rows are samples and columns are the taxa.
References
Mosimann, J. E. (1962). On the compound multinomial distribution, the multivariate \beta
-distribution, and correlations among proportions. Biometrika 49, 65-82.
Tvedebrink, T. (2010). Overdispersion in allelic counts and theta-correction in forensic genetics. Theor Popul Biol 78, 200-210.
Examples
data(saliva)
### Generate a the number of reads per sample
### The first number is the number of reads and the second is the number of subjects
nrs <- rep(15000, 20)
### Get gamma from the dirichlet-multinomial parameters
shape <- dirmult(saliva)$gamma
dmData <- Dirichlet.multinomial(nrs, shape)
dmData[1:5, 1:5]