thin_lib {seqgendiff} | R Documentation |
Binomial thinning for altering library size.
Description
Given a matrix of real RNA-seq counts, this function will apply a
separate, user-provided thinning factor to each sample. This uniformly
lowers the counts for all genes in a sample. The thinning factor
should be provided on the log2-scale. This is a specific application
of the binomial thinning approach in thin_diff
. The method is
described in detail in Gerard (2020).
Usage
thin_lib(mat, thinlog2, relative = FALSE, type = c("thin", "mult"))
Arguments
mat |
A numeric matrix of RNA-seq counts. The rows index the genes and the columns index the samples. |
thinlog2 |
A vector of numerics. Element i is the amount to thin (on the log2-scale) for sample i. For example, a value of 0 means that we do not thin, a value of 1 means that we thin by a factor of 2, a value of 2 means we thin by a factor of 4, etc. |
relative |
A logical. Should we apply relative thinning ( |
type |
Should we apply binomial thinning ( |
Value
A list-like S3 object of class ThinData
.
Components include some or all of the following:
mat
The modified matrix of counts.
designmat
The design matrix of variables used to simulate signal. This is made by column-binding
design_fixed
and the permuted version ofdesign_perm
.coefmat
A matrix of coefficients corresponding to
designmat
.design_obs
Additional variables that should be included in your design matrix in downstream fittings. This is made by column-binding the vector of 1's with
design_obs
.sv
A matrix of estimated surrogate variables. In simulation studies you would probably leave this out and estimate your own surrogate variables.
cormat
A matrix of target correlations between the surrogate variables and the permuted variables in the design matrix. This might be different from the
target_cor
you input because we pass it throughfix_cor
to ensure positive semi-definiteness of the resulting covariance matrix.matching_var
A matrix of simulated variables used to permute
design_perm
if thetarget_cor
is notNULL
.
Author(s)
David Gerard
References
Gerard, D (2020). "Data-based RNA-seq simulations by binomial thinning." BMC Bioinformatics. 21(1), 206. doi:10.1186/s12859-020-3450-9.
See Also
select_counts
For subsampling the rows and columns of your real RNA-seq count matrix prior to applying binomial thinning.
thin_diff
For the more general thinning approach.
thin_gene
For thinning gene-wise instead of sample-wise.
thin_all
For thinning all counts uniformly.
ThinDataToSummarizedExperiment
For converting a ThinData object to a SummarizedExperiment object.
ThinDataToDESeqDataSet
For converting a ThinData object to a DESeqDataSet object.
Examples
## Generate count data and thinning factors
## In practice, you would obtain mat from a real dataset, not simulate it.
set.seed(1)
n <- 10
p <- 1000
lambda <- 1000
mat <- matrix(lambda, ncol = n, nrow = p)
thinlog2 <- rexp(n = n, rate = 1)
## Thin library sizes
thout <- thin_lib(mat = mat, thinlog2 = thinlog2)
## Compare empirical thinning proportions to specified thinning proportions
empirical_propvec <- colMeans(thout$mat) / lambda
specified_propvec <- 2 ^ (-thinlog2)
empirical_propvec
specified_propvec