test_selective_inference {VALIDICLUST} | R Documentation |
Selective inference for post-clustering variable involvement
Description
Selective inference for post-clustering variable involvement
Usage
test_selective_inference(
X,
k1,
k2,
g,
ndraws = 2000,
cl_fun,
cl = NULL,
sig = NULL
)
Arguments
X |
The data matrix of size on which the clustering is applied |
k1 |
The first cluster of interest |
k2 |
The second cluster of interest |
g |
The variables for which the test is applied |
ndraws |
The number of Monte-Carlo samples |
cl_fun |
The clustering function used to build clusters |
cl |
The labels of the data obtained thanks to the |
sig |
The estimated standard deviation. Default is NULL and the standard deviation is estimated using only observations in the two clusters of interest |
Value
A list with the following elements
-
stat_g
: the test statistic used for the test. -
pval
: The resulting p-values of the test. -
stder
: The standard deviation of the p-values computed thanks to the Monte-Carlo samples. -
clusters
: The labels of the data.
Note
This function is adapted from the clusterpval::test_clusters_approx() of Gao et al. (2022) (available on Github: https://github.com/lucylgao/clusterpval)
References
Gao, L. L., Bien, J., & Witten, D. (2022). Selective inference for hierarchical clustering. Journal of the American Statistical Association, (just-accepted), 1-27.
Examples
X <- matrix(rnorm(200),ncol = 2)
hcl_fun <- function(x){
return(as.factor(cutree(hclust(dist(x), method = "ward.D2"), k=2)))}
cl <- hcl_fun(X)
plot(X, col=cl)
#Note that in practice the value of ndraws (the number of Monte-Carlo simulations must be higher)
test_var1 <- test_selective_inference(X, k1=1, k2=2, g=1, ndraws =100, cl_fun = hcl_fun, cl = cl)