test.coefficient {NBPSeq}R Documentation

Large-sample Test for a Regression Coefficient in an Negative Binomial Regression Model

Description

test.coefficient performs large-sample tests (higher-order asymptotic test, likelihood ratio test, and/or Wald test) for testing regression coefficients in an NB regression model.

Usage

test.coefficient(nb, dispersion, x, beta0, tests = c("HOA", "LR", "Wald"),
  alternative = "two.sided", subset = 1:m, print.level = 1)

Arguments

nb

an NB data object, output from prepare.nb.data.

dispersion

dispersion estimates, output from estimate.disp.

x

an nn by pp design matrix describing the treatment structure

beta0

a pp-vector specifying the null hypothesis. Non-NA components specify the parameters to test and their null values. (Currently, only one-dimensional test is implemented, so only one non-NA component is allowed).

tests

a character string vector specifying the tests to be performed, can be any subset of "HOA" (higher-order asymptotic test), "LR" (likelihood ratio test), and "Wald" (Wald test).

alternative

a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less".

subset

an index vector specifying on which rows should the tests be performed

print.level

a number controlling the amount of messages printed: 0 for suppressing all messages, 1 (default) for basic progress messages, and 2 to 5 for increasingly more detailed message.

Details

test.coefficient performs large-sample tests for a one-dimensional (q=1q=1) component ψ\psi of the pp-dimensional regression coefficient β\beta. The hypothesized value ψ0\psi_0 of ψ\psi is specified by the non-NA component of the vector beta0 in the input.

The likelihood ratio statistic,

λ=2(l(β^)l(β~)), \lambda = 2 (l(\hat\beta) - l(\tilde\beta)),

converges in distribution to a chi-square distribution with 11 degree of freedom. The signed square root of the likelihood ratio statistic λ\lambda, also called the directed deviance,

r=sign(ψ^ψ0)λr = sign (\hat\psi - \psi_0) \sqrt \lambda

converges to a standard normal distribution.

For testing a one-dimensional parameter of interest, Barndorff-Nielsen (1986, 1991) showed that a modified directed

r=r1rlog(z) r^* = r - \frac{1}{r} \log(z)

is, in wide generality, asymptotically standard normally distributed to a higher order of accuracy than the directed deviance rr itself, where zz is an adjustment term. Tests based on high-order asymptotic adjustment to the likelihood ratio statistic, such as rr^* or its approximation, are referred to as higher-order asymptotic (HOA) tests. They generally have better accuracy than corresponding unadjusted likelihood ratio tests, especially in situations where the sample size is small and/or when the number of nuisance parameters (pqp-q) is large. The implementation here is based on Skovgaard (2001). See Di et al. 2013 for more details.

Value

a list containing the following components:

beta.hat

an mm by pp matrix of regression coefficient under the full model

mu.hat

an mm by nn matrix of fitted mean frequencies under the full model

beta.tilde

an mm by pp matrix of regression coefficient under the null model

mu.tilde

an mm by nn matrix of fitted mean frequencies under the null model.

HOA, LR, Wald

each is a list of two mm-vectors, p.values and q.values, giving p-values and q-values of the corresponding tests when that test is included in tests.

References

Barndorff-Nielsen, O. (1986): "Infereni on full or partial parameters based on the standardized signed log likelihood ratio," Biometrika, 73, 307-322

Barndorff-Nielsen, O. (1991): "Modified signed log likelihood ratio," Biometrika, 78, 557-563.

Skovgaard, I. (2001): "Likelihood asymptotics," Scandinavian Journal of Statistics, 28, 3-32.

Di Y, Schafer DW, Emerson SC, Chang JH (2013): "Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data". Stat Appl Genet Mol Biol, 12(1), 49-70.

Examples

## Load Arabidopsis data
data(arab);

## Estimate normalization factors (we want to use the entire data set)
norm.factors = estimate.norm.factors(arab);

## Prepare the data
## For demonstration purpose, only the first 50 rows are used
nb.data = prepare.nb.data(arab[1:50,], lib.sizes = colSums(arab), norm.factors = norm.factors);

## For real analysis, we will use the entire data set, and can omit lib.sizes parameter)
## nb.data = prepare.nb.data(arab, norm.factors = norm.factors);

print(nb.data);
plot(nb.data);

## Specify the model matrix (experimental design)
grp.ids = as.factor(c(1, 1, 1, 2, 2, 2));
x = model.matrix(~grp.ids);

## Estimate dispersion model
dispersion = estimate.dispersion(nb.data, x);

print(dispersion);
plot(dispersion);

## Specify the null hypothesis
## The null hypothesis is beta[2]=0 (beta[2] is the log fold change).
beta0 = c(NA, 0);

## Test regression coefficient
res = test.coefficient(nb.data, dispersion, x, beta0);

## The result contains the data, the dispersion estimates and the test results
print(str(res));

## Show HOA test results for top ten most differentially expressed genes
top = order(res$HOA$p.values)[1:10];
print(cbind(nb.data$counts[top,], res$HOA[top,]));

## Plot log fold change versus the fitted mean of sample 1 (analagous to an MA-plot).
plot(res$mu.tilde[,1], res$beta.hat[,2]/log(2), log="x",
     xlab="Fitted mean of sample 1 under the null",
     ylab="Log (base 2) fold change");

## Highlight top DE genes
points(res$mu.tilde[top,1], res$beta.hat[top,2]/log(2), col="magenta");




[Package NBPSeq version 0.3.1 Index]