meantest.cq {PEtests}R Documentation

Two-sample high-dimensional mean test (Chen and Qin, 2010)

Description

This function implements the two-sample l2l_2-norm-based high-dimensional mean test proposed by Chen and Qin (2010). Suppose {X1,,Xn1}\{\mathbf{X}_1, \ldots, \mathbf{X}_{n_1}\} are i.i.d. copies of X\mathbf{X}, and {Y1,,Yn2}\{\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2}\} are i.i.d. copies of Y\mathbf{Y}. The test statistic MCQM_{CQ} is defined as

MCQ=1n1(n11)uvn1XuXv+1n2(n21)uvn2YuYv2n1n2un1vn2XuYv.M_{CQ} = \frac{1}{n_1(n_1-1)}\sum_{u\neq v}^{n_1} \mathbf{X}_{u}'\mathbf{X}_{v} +\frac{1}{n_2(n_2-1)}\sum_{u\neq v}^{n_2} \mathbf{Y}_{u}'\mathbf{Y}_{v} -\frac{2}{n_1n_2}\sum_u^{n_1}\sum_v^{n_2} \mathbf{X}_{u}'\mathbf{Y}_{v}.

Under the null hypothesis H0m:μ1=μ2H_{0m}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2, the leading variance of MCQM_{CQ} is σMCQ2=2n1(n11)tr(Σ12)+2n2(n21)tr(Σ22)+4n1n2tr(Σ1Σ2)\sigma^2_{M_{CQ}}=\frac{2}{n_1(n_1-1)}\text{tr}(\mathbf{\Sigma}_1^2)+ \frac{2}{n_2(n_2-1)}\text{tr}(\mathbf{\Sigma}_2^2)+ \frac{4}{n_1n_2}\text{tr}(\mathbf{\Sigma}_1\mathbf{\Sigma}_2), which can be consistently estimated by σ^MCQ2=2n1(n11)tr(Σ12)^+2n2(n21)tr(Σ22)^+4n1n2tr(Σ1Σ2)^.\widehat\sigma^2_{M_{CQ}}= \frac{2}{n_1(n_1-1)}\widehat{\text{tr}(\mathbf{\Sigma}_1^2)}+ \frac{2}{n_2(n_2-1)}\widehat{\text{tr}(\mathbf{\Sigma}_2^2)}+ \frac{4}{n_1n_2}\widehat{\text{tr}(\mathbf{\Sigma}_1\mathbf{\Sigma}_2)}. The explicit formulas of tr(Σ12)^\widehat{\text{tr}(\mathbf{\Sigma}_1^2)}, tr(Σ22)^\widehat{\text{tr}(\mathbf{\Sigma}_2^2)}, and tr(Σ1Σ2)^\widehat{\text{tr}(\mathbf{\Sigma}_1\mathbf{\Sigma}_2)} can be found in Section 3 of Chen and Qin (2010). With some regularity conditions, under the null hypothesis H0m:μ1=μ2H_{0m}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2, the test statistic MCQM_{CQ} converges in distribution to a standard normal distribution as n1,n2,pn_1, n_2, p \rightarrow \infty. The asymptotic pp-value is obtained by

pCQ=1Φ(MCQ/σ^MCQ),p_{CQ} = 1-\Phi(M_{CQ}/\hat\sigma_{M_{CQ}}),

where Φ()\Phi(\cdot) is the cdf of the standard normal distribution.

Usage

meantest.cq(dataX,dataY)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

Value

stat the value of test statistic

pval the p-value for the test.

References

Chen, S. X. and Qin, Y. L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Annals of Statistics, 38(2):808–835.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
meantest.cq(X,Y)

[Package PEtests version 0.1.0 Index]