exact.nb.test {NBPSeq} | R Documentation |
Exact Negative Binomial Test for Differential Gene Expression
Description
exact.nb.test
performs the Robinson and Smyth exact
negative binomial (NB) test for differential gene
expression on each gene and summarizes the results using
p-values and q-values (FDR).
Usage
exact.nb.test(obj, grp1, grp2, print.level = 1)
Arguments
obj |
output from |
grp1 |
Identifier of group 1. A number, character or string (should match at least one of the obj$grp.ids). |
grp2 |
Identifier of group 2. A number, character or string (should match at least one of the obj$grp.ids). |
print.level |
a number. Controls the amount of messages printed: 0 for suppressing all messages, 1 for basic progress messages, larger values for more detailed messages. |
Details
The negative binomial (NB) distribution offers a more realistic model for RNA-Seq count variability and still permits an exact (non-asymptotic) test for comparing expression levels in two groups.
For each gene, let S_1
, S_2
be the sums of gene
counts from all biological replicates in each group. The
exact NB test is based on the conditional distribution of
S_1|S_1+S_2
: a value of S_1
that is too big or
too small, relative to the sum S_1+S_2
, indicates
evidence for differential gene expression. When the
effective library sizes are the same in all replicates and
the dispersion parameters are known, we can determine the
probability functions of S_1
, S_2
explicitly.
The exact p-value is computed as the total conditional
probability of all possible values of (S_1, S_2)
that
have the same sum as but are more extreme than the observed
values of (S_1, S_2)
.
Note that we assume that the NB dispersion parameters for the two groups are the same and library sizes (column totals of the count matrix) are the same.
Value
the list obj
from the input with the following added
components:
grp1 |
same as input. |
grp2 |
same as input. |
pooled.pie |
estimated pooled mean of relative count frequencies in the two groups being compared. |
expression.levels |
a matrix of estimated gene
expression levels as indicated by mean relative read
frequencies. It has three columns |
log.fc |
base 2 log fold change in mean relative frequency between two groups. |
p.values |
p-values of the exact NB test applied to each gene (row). |
q.values |
q-values (estimated FDR) corresponding to the p-values. |
Note
Before calling exact.nb.test
, the user should
call estimate.norm.factors
to estimate
normalization factors, call prepare.nbp
to
adjust library sizes, and call estimate.disp
to fit a dispersion model. The exact NB test will be
performed using pseudo.counts
in the list
obj
, which are normalized and adjusted to have the
same effective library sizes (column sums of the count
matrix, multiplied by normalization factors).
Users not interested in fine tuning the underlying
statistical model should use nbp.test
instead. The all-in-one function nbp.test
uses sensible approaches to normalize the counts, estimate
the NBP model parameters and test for differential gene
expression.
A test will be performed on a row (a gene) only when the total row count is nonzero, otherwise an NA value will be assigned to the corresponding p-value and q-value.
See Also
Examples
## Load Arabidopsis data
data(arab);
## Specify treatment groups
## grp.ids = c(1, 1, 1, 2, 2, 2); # Numbers or strings are both OK
grp.ids = rep(c("mock", "hrcc"), each=3);
## Estimate normalization factors
norm.factors = estimate.norm.factors(arab);
print(norm.factors);
## Prepare an NBP object, adjust the library sizes by thinning the
## counts. For demonstration purpose, only use the first 100 rows of
## the arab data.
set.seed(999);
obj = prepare.nbp(arab[1:100,], grp.ids, lib.size=colSums(arab), norm.factors=norm.factors);
print(obj);
## Fit a dispersion model (NBQ by default)
obj = estimate.disp(obj);
plot(obj);
## Perform exact NB test
## grp1 = 1;
## grp2 = 2;
grp1 = "mock";
grp2 = "hrcc";
obj = exact.nb.test(obj, grp1, grp2);
## Print and plot results
print(obj);
par(mfrow=c(3,2));
plot(obj);