R: A function to perform K-sample test using GLP algorithm

GLP {LPKsample}

R Documentation

A function to perform K-sample test using GLP algorithm

Description

This function performs the GLP multivariate K-sample learning.

Usage

GLP(X,y,m.max=4,components=NULL,alpha=0.05,c.poly=0.5,clust.alg='kmeans',perm=0,
	combine.criterion='pvalue',multiple.comparison=TRUE,
	compress.algorithm=FALSE,nbasis=8, return.LPT=FALSE,return.clust=FALSE)

Arguments

`X`	A `n`-by-`d` matrix of the observations, the observations should be grouped by their respective classes.
`y`	A length `n` vector indicating the sample class.
`m.max`	An integer, maximum order of LP component to investigate, default: 4.
`components`	A vector specifying which components to test. If provided with any value other than NULL, the test will only examine the components mentioned in this argument, ignoring the m.max settings.
`alpha`	Numeric, confidence level `\alpha` , default: 0.05.
`c.poly`	Numeric, parameter for polynomial kernel, default: 0.5.
`perm`	Number of permutations for approximating p-value, set to 0 to use asymptotic p-value.
`combine.criterion`	How to obtain the overall testing result based on the component-wise results; 'pvalue' uses Fisher's method to combine the p-values from each component; 'kernel' computes an overall kernel `W` based on the significant components and run the LP graph test on the `W`.
`multiple.comparison`	Set to TRUE to use adjustment for multiple comparisons when determining which components are significant.
`compress.algorithm`	Use the smooth compression of Laplacian spectra for testing the null hypothesis. Recommended for large `n`.
`nbasis`	Number of bases used for approximation when `compress.algorithm=TRUE`.
`clust.alg`	`"mclust"` or `"kmeans"`; algorithm used for clustering in graph community detection.
`return.LPT`	logical, whether or not to return the data driven covariate matrix, default: FALSE.
`return.clust`	logical, whether or not to return the class labels assigned by graph community detection, default: FALSE.

Value

A list containing the following items:

`GLP`	Overall GLP statistics.
`pval`	Overall P-value.
`table`	The GLP component table indicating the significance of each component.
`components`	significant eLP components for the data set.
`LPT`	(optional) matrix of data driven covariates.
`clust`	(optional) class labels assigned by graph community detection.

Author(s)

Mukhopadhyay, S. and Wang, K.

References

Mukhopadhyay, S. and Wang, K. (2020), "A Nonparametric Approach to High-dimensional K-sample Comparison Problem", arXiv:1810.01724.

Mukhopadhyay, S. and Wang, K. (2020). "Towards a unified statistical theory of spectralgraph analysis", arXiv:1901.07090,

Examples



  ##1.muiltivariate normal distribution with only mean difference:
  ##generate data, n1=n2=10, dimension 25
   X1<-matrix(rnorm(250,mean=0,sd=1),10,25)
   X2<-matrix(rnorm(250,mean=0.5,sd=1),10,25)
   y<-c(rep(1,10),rep(2,10))
   X<-rbind(X1,X2)
  ##GLP test:
   locdiff.test<-GLP(X,y,m.max=4)

  ## Not run: 
  ##2.Leukemia data example
   data(leukemia)
   attach(leukemia)
   leukemia.test<-GLP(X,class,components=1:4)
  ##confirmatory results:
   leukemia.test$GLP  # overall statistic
   #[1] 0.2092378
   leukemia.test$pval # overall p-value
   #[1] 0.0001038647
  ##exploratory outputs:
   leukemia.test$table  # rows as shown in Table 3 of reference
   #     component    comp.GLP       pvalue
   #[1,]         1 0.209237826 0.0001038647
   #[2,]         2 0.022145514 0.2066876581
   #[3,]         3 0.002025545 0.7025436476
   #[4,]         4 0.033361702 0.1211769396
  
## End(Not run)

[Package LPKsample version 2.1 Index]