GLP {LPKsample}R Documentation

A function to perform K-sample test using GLP algorithm

Description

This function performs the GLP multivariate K-sample learning.

Usage

GLP(X,y,m.max=4,components=NULL,alpha=0.05,c.poly=0.5,clust.alg='kmeans',perm=0,
	combine.criterion='pvalue',multiple.comparison=TRUE,
	compress.algorithm=FALSE,nbasis=8, return.LPT=FALSE,return.clust=FALSE)

Arguments

X

A n-by-d matrix of the observations, the observations should be grouped by their respective classes.

y

A length n vector indicating the sample class.

m.max

An integer, maximum order of LP component to investigate, default: 4.

components

A vector specifying which components to test. If provided with any value other than NULL, the test will only examine the components mentioned in this argument, ignoring the m.max settings.

alpha

Numeric, confidence level \alpha , default: 0.05.

c.poly

Numeric, parameter for polynomial kernel, default: 0.5.

perm

Number of permutations for approximating p-value, set to 0 to use asymptotic p-value.

combine.criterion

How to obtain the overall testing result based on the component-wise results; 'pvalue' uses Fisher's method to combine the p-values from each component; 'kernel' computes an overall kernel W based on the significant components and run the LP graph test on the W.

multiple.comparison

Set to TRUE to use adjustment for multiple comparisons when determining which components are significant.

compress.algorithm

Use the smooth compression of Laplacian spectra for testing the null hypothesis. Recommended for large n.

nbasis

Number of bases used for approximation when compress.algorithm=TRUE.

clust.alg

"mclust" or "kmeans"; algorithm used for clustering in graph community detection.

return.LPT

logical, whether or not to return the data driven covariate matrix, default: FALSE.

return.clust

logical, whether or not to return the class labels assigned by graph community detection, default: FALSE.

Value

A list containing the following items:

GLP

Overall GLP statistics.

pval

Overall P-value.

table

The GLP component table indicating the significance of each component.

components

significant eLP components for the data set.

LPT

(optional) matrix of data driven covariates.

clust

(optional) class labels assigned by graph community detection.

Author(s)

Mukhopadhyay, S. and Wang, K.

References

Mukhopadhyay, S. and Wang, K. (2020), "A Nonparametric Approach to High-dimensional K-sample Comparison Problem", arXiv:1810.01724.

Mukhopadhyay, S. and Wang, K. (2020). "Towards a unified statistical theory of spectralgraph analysis", arXiv:1901.07090,

Examples



  ##1.muiltivariate normal distribution with only mean difference:
  ##generate data, n1=n2=10, dimension 25
   X1<-matrix(rnorm(250,mean=0,sd=1),10,25)
   X2<-matrix(rnorm(250,mean=0.5,sd=1),10,25)
   y<-c(rep(1,10),rep(2,10))
   X<-rbind(X1,X2)
  ##GLP test:
   locdiff.test<-GLP(X,y,m.max=4)

  ## Not run: 
  ##2.Leukemia data example
   data(leukemia)
   attach(leukemia)
   leukemia.test<-GLP(X,class,components=1:4)
  ##confirmatory results:
   leukemia.test$GLP  # overall statistic
   #[1] 0.2092378
   leukemia.test$pval # overall p-value
   #[1] 0.0001038647
  ##exploratory outputs:
   leukemia.test$table  # rows as shown in Table 3 of reference
   #     component    comp.GLP       pvalue
   #[1,]         1 0.209237826 0.0001038647
   #[2,]         2 0.022145514 0.2066876581
   #[3,]         3 0.002025545 0.7025436476
   #[4,]         4 0.033361702 0.1211769396
  
## End(Not run)

[Package LPKsample version 2.1 Index]