Hypothesis test for two high-dimensional mean vectors {mvhtests} | R Documentation |
Hypothesis test for two high-dimensional mean vectors
Description
Hypothesis test for two high-dimensional mean vectors.
Usage
sarabai(x1, x2)
Arguments
x1 |
A matrix containing the Euclidean data of the first group. |
x2 |
A matrix containing the Euclidean data of the second group. |
Details
High dimensional data are the multivariate data which have many variables (p
) and usually a small number of observations (n
). It also happens that p>n
and this is the case here in this Section. We will see a simple test for the case of p>n
. In this case, the covariance matrix is not invertible and in addition it can have a lot of zero eigenvalues.
The test we will see was proposed by Bai and Saranadasa (1996). Ever since, there have been some more suggestions but I chose this one for its simplicity. There are two datasets, {\bf X}_1
and {\bf X}_2
of sample sizes n_1
and n_2
, respectively. Their corresponding sample mean vectors and covariance matrices are \bar{{\bf X}}_1
, \bar{{\bf X}}_2
and {\bf S}_1
, {\bf S}_2
respectively. The assumption here is the same as that of the Hotelling's test we saw before.
Let us define the pooled covariance matrix at first, calculated under the assumption of equal covariance matrices
{\bf S}_n=\frac{\left(n_1-1\right){\bf S}_1+\left(n_2-1\right){\bf S}_2}{n}
,
where n=n_1+n_2
. Then define B_n=\sqrt{ \frac{n^2}{\left(n+2\right)\left(n-1\right)}\left\lbrace\text{tr}\left({\bf S}_n^2\right)-
\frac{1}{n}\left[\text{tr}\left({\bf S}_n\right)\right]^2 \right\rbrace }
.
The test statistic is
Z=\frac{\frac{n_1n_2}{n_1+n_2}\left(\bar{{\bf X}}_1-\bar{{\bf X}}_2\right)^T\left(\bar{{\bf X}}_1-\bar{{\bf X}}_2\right)
-\text{tr}\left({\bf S}_n\right)}{\sqrt{\frac{2\left(n+1\right)}{n}}B_n}.
Under the null hypothesis (equality of the two mean vectors) the test statistic follows the standard normal distribution. Bai and Saranadasa (1996) established the asymptotic normality of the test statistics and showed that it has attractive power property when p/n \rightarrow c < \infty
and under some restriction on the maximum eigenvalue of the common population covariance matrix. However, the requirement of p
and n
being of the same order is too restrictive to be used in the "large p
small n
" situation.
Value
A vector with the test statistic and the p-value.
Author(s)
Michail Tsagris.
R implementation and documentation: Michail Tsagris mtsagris@uoc.gr.
References
Bai Z. D. and Saranadasa H. (1996). Effect of high dimension: by an example of a two sample problem. Statistica Sinica, 6(2): 311–329.
See Also
hotel2T2, maov, el.test2, eel.test2
Examples
x1 <- matrix( rnorm(40 * 100), ncol = 100 )
x2 <- matrix( rnorm(50 * 100), ncol = 100 )
sarabai(x1, x2)