R: Hypothesis test for two high-dimensional mean vectors

Hypothesis test for two high-dimensional mean vectors {mvhtests}

R Documentation

Hypothesis test for two high-dimensional mean vectors

Description

Hypothesis test for two high-dimensional mean vectors.

Usage

sarabai(x1, x2)

Arguments

`x1`	A matrix containing the Euclidean data of the first group.
`x2`	A matrix containing the Euclidean data of the second group.

Details

High dimensional data are the multivariate data which have many variables (p) and usually a small number of observations (n). It also happens that p>n and this is the case here in this Section. We will see a simple test for the case of p>n. In this case, the covariance matrix is not invertible and in addition it can have a lot of zero eigenvalues.

The test we will see was proposed by Bai and Saranadasa (1996). Ever since, there have been some more suggestions but I chose this one for its simplicity. There are two datasets, {\bf X}_1 and {\bf X}_2 of sample sizes n_1 and n_2, respectively. Their corresponding sample mean vectors and covariance matrices are \bar{{\bf X}}_1, \bar{{\bf X}}_2 and {\bf S}_1, {\bf S}_2 respectively. The assumption here is the same as that of the Hotelling's test we saw before.

Let us define the pooled covariance matrix at first, calculated under the assumption of equal covariance matrices {\bf S}_n=\frac{\left(n_1-1\right){\bf S}_1+\left(n_2-1\right){\bf S}_2}{n}, where n=n_1+n_2. Then define B_n=\sqrt{ \frac{n^2}{\left(n+2\right)\left(n-1\right)}\left\lbrace\text{tr}\left({\bf S}_n^2\right)- \frac{1}{n}\left[\text{tr}\left({\bf S}_n\right)\right]^2 \right\rbrace }. The test statistic is

Z=\frac{\frac{n_1n_2}{n_1+n_2}\left(\bar{{\bf X}}_1-\bar{{\bf X}}_2\right)^T\left(\bar{{\bf X}}_1-\bar{{\bf X}}_2\right) -\text{tr}\left({\bf S}_n\right)}{\sqrt{\frac{2\left(n+1\right)}{n}}B_n}.

Under the null hypothesis (equality of the two mean vectors) the test statistic follows the standard normal distribution. Bai and Saranadasa (1996) established the asymptotic normality of the test statistics and showed that it has attractive power property when p/n \rightarrow c < \infty and under some restriction on the maximum eigenvalue of the common population covariance matrix. However, the requirement of p and n being of the same order is too restrictive to be used in the "large p small n" situation.

Value

A vector with the test statistic and the p-value.

Author(s)

Michail Tsagris.

R implementation and documentation: Michail Tsagris mtsagris@uoc.gr.

References

Bai Z. D. and Saranadasa H. (1996). Effect of high dimension: by an example of a two sample problem. Statistica Sinica, 6(2): 311–329.

Examples

x1 <- matrix( rnorm(40 * 100), ncol = 100 )
x2 <- matrix( rnorm(50 * 100), ncol = 100 )
sarabai(x1, x2)

[Package mvhtests version 1.0 Index]