| poisthin {seqgendiff} | R Documentation | 
Apply Poisson thinning to a matrix of count data.
Description
This is now defunct. Please try out select_counts and
thin_2group.
Usage
poisthin(
  mat,
  nsamp = nrow(mat),
  ngene = ncol(mat),
  gselect = c("max", "random", "rand_max", "custom", "mean_max"),
  gvec = NULL,
  skip_gene = 0L,
  signal_fun = stats::rnorm,
  signal_params = list(mean = 0, sd = 1),
  prop_null = 1,
  alpha = 0,
  group_assign = c("frac", "random", "cor"),
  group_prop = 0.5,
  corvec = NULL
)
Arguments
| mat | A matrix of count data. The rows index the individuals and the columns index the genes. | 
| nsamp | The number of samples to select from  | 
| ngene | The number of genes to select from  | 
| gselect | How should we select the subset of genes? Should we choose
the  | 
| gvec | A logical of length  | 
| skip_gene | The number of maximally expressed genes to skip.
Not used if  | 
| signal_fun | A function that returns the signal. This should take as
input  | 
| signal_params | A list of additional arguments to pass to  | 
| prop_null | The proportion of genes that are null. | 
| alpha | If  | 
| group_assign | How should we assign groups? Exactly specifying the
proportion of individuals in each group ( | 
| group_prop | The proportion of individuals that are in group 1.
This proportion is deterministic if  | 
| corvec | A vector of correlations.  | 
Details
Given a matrix of RNA-seq counts, this function will randomly select two groups of samples and add signal to a known proportion of the genes. This signal is the log (base 2) effect size of the group indicator in a linear model. The user may specify the distribution of the effects.
The Poisson thinning approach first randomly assigns samples to be in one of two groups. Then, given this assignment, will Binomially sample counts with a sample size of the gene expression counts and a probability that is a function of the effect size. For details, see Gerard and Stephens (2021).
Value
A list with the following elements:
- Y
- A matrix of altered counts with - nsamprows and- ngenecolumns.
- X
- A design matrix. The first column contains a vector ones (for an intercept term) and the second column contains an indicator for group membership. 
- beta
- The approximately true effect sizes of - log(Y) ~ X\beta.
- corassign
- The output from the call to - corassign. Only returned if- group_assign = "cor".
Author(s)
David Gerard
References
- Gerard, D., and Stephens, M. (2021). "Unifying and Generalizing Methods for Removing Unwanted Variation Based on Negative Controls." Statistica Sinica, 31(3), 1145-1166 doi:10.5705/ss.202018.0345. 
Examples
## Simulate data from given matrix of counts
## In practice, you would obtain Y from a real dataset, not simulate it.
set.seed(1)
nsamp <- 10
ngene <- 1000
Y <- matrix(stats::rpois(nsamp * ngene, lambda = 50), nrow = ngene)
## Apply thinning
poisout <- poisthin(mat           = t(Y),
                    nsamp         = 9,
                    ngene         = 999,
                    signal_fun    = stats::rnorm,
                    signal_params = list(mean = 0, sd = 1),
                    prop_null     = 0.9)
## Dimension of count matrix is smaller.
dim(poisout$Y)
## Can verify signal was added by estimating it with lm().
betahat <- coef(lm(log2(poisout$Y + 1) ~ poisout$X[, 2]))[2, ]
plot(poisout$beta, betahat, xlab = "Coefficients", ylab = "Estimates")
abline(0, 1, col = 2, lty = 2)