Hypothesis test for two high-dimensional mean vectors {mvhtests}R Documentation

Hypothesis test for two high-dimensional mean vectors

Description

Hypothesis test for two high-dimensional mean vectors.

Usage

sarabai(x1, x2)

Arguments

x1

A matrix containing the Euclidean data of the first group.

x2

A matrix containing the Euclidean data of the second group.

Details

High dimensional data are the multivariate data which have many variables (pp) and usually a small number of observations (nn). It also happens that p>np>n and this is the case here in this Section. We will see a simple test for the case of p>np>n. In this case, the covariance matrix is not invertible and in addition it can have a lot of zero eigenvalues.

The test we will see was proposed by Bai and Saranadasa (1996). Ever since, there have been some more suggestions but I chose this one for its simplicity. There are two datasets, X1{\bf X}_1 and X2{\bf X}_2 of sample sizes n1n_1 and n2n_2, respectively. Their corresponding sample mean vectors and covariance matrices are Xˉ1\bar{{\bf X}}_1, Xˉ2\bar{{\bf X}}_2 and S1{\bf S}_1, S2{\bf S}_2 respectively. The assumption here is the same as that of the Hotelling's test we saw before.

Let us define the pooled covariance matrix at first, calculated under the assumption of equal covariance matrices Sn=(n11)S1+(n21)S2n {\bf S}_n=\frac{\left(n_1-1\right){\bf S}_1+\left(n_2-1\right){\bf S}_2}{n}, where n=n1+n2n=n_1+n_2. Then define Bn=n2(n+2)(n1){tr(Sn2)1n[tr(Sn)]2}B_n=\sqrt{ \frac{n^2}{\left(n+2\right)\left(n-1\right)}\left\lbrace\text{tr}\left({\bf S}_n^2\right)- \frac{1}{n}\left[\text{tr}\left({\bf S}_n\right)\right]^2 \right\rbrace }. The test statistic is

Z=n1n2n1+n2(Xˉ1Xˉ2)T(Xˉ1Xˉ2)tr(Sn)2(n+1)nBn. Z=\frac{\frac{n_1n_2}{n_1+n_2}\left(\bar{{\bf X}}_1-\bar{{\bf X}}_2\right)^T\left(\bar{{\bf X}}_1-\bar{{\bf X}}_2\right) -\text{tr}\left({\bf S}_n\right)}{\sqrt{\frac{2\left(n+1\right)}{n}}B_n}.

Under the null hypothesis (equality of the two mean vectors) the test statistic follows the standard normal distribution. Bai and Saranadasa (1996) established the asymptotic normality of the test statistics and showed that it has attractive power property when p/nc<p/n \rightarrow c < \infty and under some restriction on the maximum eigenvalue of the common population covariance matrix. However, the requirement of pp and nn being of the same order is too restrictive to be used in the "large pp small nn" situation.

Value

A vector with the test statistic and the p-value.

Author(s)

Michail Tsagris.

R implementation and documentation: Michail Tsagris mtsagris@uoc.gr.

References

Bai Z. D. and Saranadasa H. (1996). Effect of high dimension: by an example of a two sample problem. Statistica Sinica, 6(2): 311–329.

See Also

hotel2T2, maov, el.test2, eel.test2

Examples

x1 <- matrix( rnorm(40 * 100), ncol = 100 )
x2 <- matrix( rnorm(50 * 100), ncol = 100 )
sarabai(x1, x2)

[Package mvhtests version 1.0 Index]