PoisMixSim {HTSCluster} | R Documentation |
Simulate data from a Poisson mixture model
Description
This function simulates data from a Poisson mixture model, as described by Rau et al. (2011). Data are simulated with varying expression level (w_i
) for 4 clusters. Clusters may be simulated with “high” or “low” separation, and three different options are available for the library size setting: “equal”, “A”, and “B”, as described by Rau et al. (2011).
Usage
PoisMixSim(n = 2000, libsize, separation)
Arguments
n |
Number of observations |
libsize |
The type of library size difference to be simulated (“ |
separation |
Cluster separation (“ |
Value
y |
(n x q) matrix of simulated counts for n observations and q variables |
labels |
Vector of length n defining the true cluster labels of the simulated data |
pi |
Vector of length 4 (the number of clusters) containing the true value of |
lambda |
(d x 4) matrix of |
w |
Row sums of |
conditions |
Vector of length q defining the condition (treatment group) for each variable (column) in |
Note
If one or more observations are simulated such that all variables have a value of 0, those rows are removed from the data matrix; as such, in some cases the simulated data y
may have less than n
rows.
The PMM-I model includes the parameter constraint \sum_k \lambda_{jk} r_j = 1
, where r_j
is the number of replicates in condition (treatment group) j
. Similarly, the parameter constraint in the PMM-II model is \sum_j \sum_l \lambda_{jk}s_{jl} = 1
, where s_{jl}
is the library size for replicate l of condition j. The value of lambda
corresponds to that used to generate the simulated data, where the library sizes were set as described in Table 2 of Rau et al. (2011). However, due to variability in the simulation process, the actually library sizes of the data y
are not exactly equal to these values; this means that the value of lambda
may not be directly compared to an estimated value of \hat{\boldsymbol{\lambda}}
as obtained from the PoisMixClus
function.
Author(s)
Andrea Rau
References
Rau, A., Celeux, G., Martin-Magniette, M.-L., Maugis-Rabusseau, C. (2011). Clustering high-throughput sequencing data with Poisson mixture models. Inria Research Report 7786. Available at https://inria.hal.science/inria-00638082.
Examples
set.seed(12345)
## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 200 observations
simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions