estimate.dispersion {NBPSeq} | R Documentation |
Estimate Negative Binomial Dispersion
Description
Estimate NB dispersion by modeling it as a parametric function of preliminarily estimated log mean relative frequencies.
Usage
estimate.dispersion(nb.data, x, model = "NBQ", method = "MAPL", ...)
Arguments
nb.data |
output from
|
x |
a design matrix specifying the mean structure of each row. |
model |
the name of the dispersion model, one of "NB2", "NBP", "NBQ" (default), "NBS" or "step". |
method |
a character string specifying the method for estimating the dispersion model, one of "ML" or "MAPL" (default). |
... |
(for future use). |
Details
We use a negative binomial (NB) distribution to model the
read frequency of gene i
in sample j
. A
negative binomial (NB) distribution uses a dispersion
parameter \phi_{ij}
to model the extra-Poisson
variation between biological replicates. Under the NB
model, the mean-variance relationship of a single read
count satisfies \sigma_{ij}^2 = \mu_{ij} + \phi_{ij}
\mu_{ij}^2
. Due to the typically small sample sizes of
RNA-Seq experiments, estimating the NB dispersion
\phi_{ij}
for each gene i
separately is not
reliable. One can pool information across genes and
biological samples by modeling \phi_{ij}
as a
function of the mean frequencies and library sizes.
Under the NB2 model, the dispersion is a constant across all genes and samples.
Under the NBP model, the log dispersion is modeled as a
linear function of the preliminary estimates of the log
mean relative frequencies (pi.pre
):
log(phi) = par[1] + par[2] * log(pi.pre/pi.offset),
where pi.offset
is 1e-4.
Under the NBQ model, the dispersion is modeled as a quadratic function of the preliminary estimates of the log mean relative frequencies (pi.pre):
log(phi) = par[1] + par[2] * z + par[3] * z^2,
where z = log(pi.pre/pi.offset). By default, pi.offset is the median of pi.pre[subset,].
Under this NBS model, the dispersion is modeled as a smooth function (a natural cubic spline function) of the preliminary estimates of the log mean relative frequencies (pi.pre).
Under the "step" model, the dispersion is modeled as a step (piecewise constant) function.
Value
a list with following components:
estimates |
dispersion estimates for each read count,
a matrix of the same dimensions as the |
likelihood |
the likelihood of the fitted model. |
model |
details of the estimate dispersion model, NOT intended for use by end users. The name and contents of this component are subject to change in future versions. |
Note
Currently, it is unclear whether a dispersion-modeling approach will outperform a more basic approach where regression model is fitted to each gene separately without considering the dispersion-mean dependence. Clarifying the power-robustness of the dispersion-modeling approach is an ongoing research topic.
Examples
## See the example for test.coefficient.