R: Large-sample Test for a Regression Coefficient in an Negative...

test.coefficient {NBPSeq}

R Documentation

Large-sample Test for a Regression Coefficient in an Negative Binomial Regression Model

Description

test.coefficient performs large-sample tests (higher-order asymptotic test, likelihood ratio test, and/or Wald test) for testing regression coefficients in an NB regression model.

Usage

test.coefficient(nb, dispersion, x, beta0, tests = c("HOA", "LR", "Wald"),
  alternative = "two.sided", subset = 1:m, print.level = 1)

Arguments

`nb`	an NB data object, output from `prepare.nb.data`.
`dispersion`	dispersion estimates, output from `estimate.disp`.
`x`	an `n` by `p` design matrix describing the treatment structure
`beta0`	a `p`-vector specifying the null hypothesis. Non-NA components specify the parameters to test and their null values. (Currently, only one-dimensional test is implemented, so only one non-NA component is allowed).
`tests`	a character string vector specifying the tests to be performed, can be any subset of `"HOA"` (higher-order asymptotic test), `"LR"` (likelihood ratio test), and `"Wald"` (Wald test).
`alternative`	a character string specifying the alternative hypothesis, must be one of `"two.sided"` (default), `"greater"` or `"less"`.
`subset`	an index vector specifying on which rows should the tests be performed
`print.level`	a number controlling the amount of messages printed: 0 for suppressing all messages, 1 (default) for basic progress messages, and 2 to 5 for increasingly more detailed message.

Details

test.coefficient performs large-sample tests for a one-dimensional (q=1) component \psi of the p-dimensional regression coefficient \beta. The hypothesized value \psi_0 of \psi is specified by the non-NA component of the vector beta0 in the input.

The likelihood ratio statistic,

\lambda = 2 (l(\hat\beta) - l(\tilde\beta)),

converges in distribution to a chi-square distribution with 1 degree of freedom. The signed square root of the likelihood ratio statistic \lambda, also called the directed deviance,

r = sign (\hat\psi - \psi_0) \sqrt \lambda

converges to a standard normal distribution.

For testing a one-dimensional parameter of interest, Barndorff-Nielsen (1986, 1991) showed that a modified directed

r^* = r - \frac{1}{r} \log(z)

is, in wide generality, asymptotically standard normally distributed to a higher order of accuracy than the directed deviance r itself, where z is an adjustment term. Tests based on high-order asymptotic adjustment to the likelihood ratio statistic, such as r^* or its approximation, are referred to as higher-order asymptotic (HOA) tests. They generally have better accuracy than corresponding unadjusted likelihood ratio tests, especially in situations where the sample size is small and/or when the number of nuisance parameters (p-q) is large. The implementation here is based on Skovgaard (2001). See Di et al. 2013 for more details.

Value

a list containing the following components:

`beta.hat`	an `m` by `p` matrix of regression coefficient under the full model
`mu.hat`	an `m` by `n` matrix of fitted mean frequencies under the full model
`beta.tilde`	an `m` by `p` matrix of regression coefficient under the null model
`mu.tilde`	an `m` by `n` matrix of fitted mean frequencies under the null model.
`HOA`, `LR`, `Wald`	each is a list of two `m`-vectors, `p.values` and `q.values`, giving p-values and q-values of the corresponding tests when that test is included in `tests`.

References

Barndorff-Nielsen, O. (1986): "Infereni on full or partial parameters based on the standardized signed log likelihood ratio," Biometrika, 73, 307-322

Barndorff-Nielsen, O. (1991): "Modified signed log likelihood ratio," Biometrika, 78, 557-563.

Skovgaard, I. (2001): "Likelihood asymptotics," Scandinavian Journal of Statistics, 28, 3-32.

Di Y, Schafer DW, Emerson SC, Chang JH (2013): "Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data". Stat Appl Genet Mol Biol, 12(1), 49-70.

Examples

## Load Arabidopsis data
data(arab);

## Estimate normalization factors (we want to use the entire data set)
norm.factors = estimate.norm.factors(arab);

## Prepare the data
## For demonstration purpose, only the first 50 rows are used
nb.data = prepare.nb.data(arab[1:50,], lib.sizes = colSums(arab), norm.factors = norm.factors);

## For real analysis, we will use the entire data set, and can omit lib.sizes parameter)
## nb.data = prepare.nb.data(arab, norm.factors = norm.factors);

print(nb.data);
plot(nb.data);

## Specify the model matrix (experimental design)
grp.ids = as.factor(c(1, 1, 1, 2, 2, 2));
x = model.matrix(~grp.ids);

## Estimate dispersion model
dispersion = estimate.dispersion(nb.data, x);

print(dispersion);
plot(dispersion);

## Specify the null hypothesis
## The null hypothesis is beta[2]=0 (beta[2] is the log fold change).
beta0 = c(NA, 0);

## Test regression coefficient
res = test.coefficient(nb.data, dispersion, x, beta0);

## The result contains the data, the dispersion estimates and the test results
print(str(res));

## Show HOA test results for top ten most differentially expressed genes
top = order(res$HOA$p.values)[1:10];
print(cbind(nb.data$counts[top,], res$HOA[top,]));

## Plot log fold change versus the fitted mean of sample 1 (analagous to an MA-plot).
plot(res$mu.tilde[,1], res$beta.hat[,2]/log(2), log="x",
     xlab="Fitted mean of sample 1 under the null",
     ylab="Log (base 2) fold change");

## Highlight top DE genes
points(res$mu.tilde[top,1], res$beta.hat[top,2]/log(2), col="magenta");

[Package NBPSeq version 0.3.1 Index]