R: Anchored test for two-sample mean comparison.

anchored_lasso_testing {HMC}

R Documentation

Anchored test for two-sample mean comparison.

Description

Anchored test for two-sample mean comparison.

Usage

anchored_lasso_testing(
  sample_1,
  sample_2,
  pca_method = "sparse_pca",
  mean_method = "lasso",
  num_latent_factor = 1,
  n_folds = 5,
  verbose = TRUE
)

Arguments

`sample_1`	Group 1 sample. Each row is a subject and each column corresponds to a feature.
`sample_2`	Group 2 sample. Each row is a subject and each column corresponds to a feature.
`pca_method`	Methods used to estimate principle component The default is "sparse_pca", using sparse PCA from package PMA. Other choices are "dense_pca"—the regular PCA; and "hard"— hard-thresholding PCA, which also induces sparsity.
`mean_method`	Methods used to estimate the mean vector. Default is sample mean "naive". There is also a hard-thresholding sparse estiamtor "hard".
`num_latent_factor`	The principle component that lasso coefficient anchors at. The default is PC1 = 1.
`n_folds`	Number of splits when performing cross-fitting. The default is 5, if computational time allows, you can try to set it to 10.
`verbose`	Print information to the console. Default is TRUE.

Value

A list of test statistics.

`test_statistics`	Test statistics. Each entry corresponds to the test result of one principle component.
`standard_error`	Estimated standard error of test_statistics_before_studentization.
`test_statistics_before_studentization`	Similar to test_statistics but does not have variance = 1.
`split_data`	Intermediate quantities needed for further assessment and interpretation of the test results.

Examples

sample_size_1 <- sample_size_2 <- 300
true_mean_1 <- matrix(c(rep(1, 10), rep(0, 90)), ncol = 1)
true_mean_2 <- matrix(c(rep(1.5, 10), rep(0, 90)), ncol = 1)

sample_1 <- data.frame(MASS::mvrnorm(sample_size_1,
                               mu = true_mean_1,
                               Sigma = diag(1, 100)))
 sample_2 <- data.frame(MASS::mvrnorm(sample_size_2,
                               mu = true_mean_2,
                               Sigma = diag(1, 100)))
 result <- anchored_lasso_testing(sample_1, sample_2)
 result$test_statistics
 ##the test statistic. It should follow normal(0,1) when there is no difference between the groups.
 summarize_feature_name(result) 
 #summarize which features contribute to discriminant vectors (i.e. logistic lasso)
 extract_pc(result) # extract the estimated discriminant coefficients

[Package HMC version 1.0 Index]