| test_selective_inference {VALIDICLUST} | R Documentation | 
Selective inference for post-clustering variable involvement
Description
Selective inference for post-clustering variable involvement
Usage
test_selective_inference(
  X,
  k1,
  k2,
  g,
  ndraws = 2000,
  cl_fun,
  cl = NULL,
  sig = NULL
)
Arguments
| X | The data matrix of size on which the clustering is applied | 
| k1 | The first cluster of interest | 
| k2 | The second cluster of interest | 
| g | The variables for which the test is applied | 
| ndraws | The number of Monte-Carlo samples | 
| cl_fun | The clustering function used to build clusters | 
| cl | The labels of the data obtained thanks to the  | 
| sig | The estimated standard deviation. Default is NULL and the standard deviation is estimated using only observations in the two clusters of interest | 
Value
A list with the following elements
-  stat_g: the test statistic used for the test.
-  pval: The resulting p-values of the test.
-  stder: The standard deviation of the p-values computed thanks to the Monte-Carlo samples.
-  clusters: The labels of the data.
Note
This function is adapted from the clusterpval::test_clusters_approx() of Gao et al. (2022) (available on Github: https://github.com/lucylgao/clusterpval)
References
Gao, L. L., Bien, J., & Witten, D. (2022). Selective inference for hierarchical clustering. Journal of the American Statistical Association, (just-accepted), 1-27.
Examples
X <- matrix(rnorm(200),ncol = 2)
hcl_fun <- function(x){
return(as.factor(cutree(hclust(dist(x), method = "ward.D2"), k=2)))}
cl <- hcl_fun(X)
plot(X, col=cl)
#Note that in practice the value of ndraws (the number of Monte-Carlo simulations must be higher)
test_var1 <- test_selective_inference(X, k1=1, k2=2, g=1, ndraws =100, cl_fun = hcl_fun, cl = cl)