trex {TRexSelector}R Documentation

Run the T-Rex selector (doi:10.48550/arXiv.2110.06048)

Description

The T-Rex selector (doi:10.48550/arXiv.2110.06048) performs fast variable selection in high-dimensional settings while controlling the false discovery rate (FDR) at a user-defined target level.

Usage

trex(
  X,
  y,
  tFDR = 0.2,
  K = 20,
  max_num_dummies = 10,
  max_T_stop = TRUE,
  method = "trex",
  GVS_type = "IEN",
  cor_coef = NA,
  type = "lar",
  corr_max = 0.5,
  lambda_2_lars = NULL,
  rho_thr_DA = 0.02,
  hc_dist = "single",
  hc_grid_length = min(20, ncol(X)),
  parallel_process = FALSE,
  parallel_max_cores = min(K, max(1, parallel::detectCores(logical = FALSE))),
  seed = NULL,
  eps = .Machine$double.eps,
  verbose = TRUE
)

Arguments

X

Real valued predictor matrix.

y

Response vector.

tFDR

Target FDR level (between 0 and 1, i.e., 0% and 100%).

K

Number of random experiments.

max_num_dummies

Integer factor determining the maximum number of dummies as a multiple of the number of original variables p (i.e., num_dummies = max_num_dummies * p).

max_T_stop

If TRUE the maximum number of dummies that can be included before stopping is set to ceiling(n / 2), where n is the number of data points/observations.

method

'trex' for the T-Rex selector (doi:10.48550/arXiv.2110.06048), 'trex+GVS' for the T-Rex+GVS selector (doi:10.23919/EUSIPCO55093.2022.9909883), 'trex+DA+AR1' for the T-Rex+DA+AR1 selector, 'trex+DA+equi' for the T-Rex+DA+equi selector, 'trex+DA+BT' for the T-Rex+DA+BT selector (doi:10.48550/arXiv.2401.15796), 'trex+DA+NN' for the T-Rex+DA+NN selector (doi:10.48550/arXiv.2401.15139).

GVS_type

'IEN' for the Informed Elastic Net (doi:10.1109/CAMSAP58249.2023.10403489), 'EN' for the ordinary Elastic Net (doi:10.1111/j.1467-9868.2005.00503.x).

cor_coef

AR(1) autocorrelation coefficient for the T-Rex+DA+AR1 selector or equicorrelation coefficient for the T-Rex+DA+equi selector.

type

'lar' for 'LARS' and 'lasso' for Lasso.

corr_max

Maximum allowed correlation between any two predictors from different clusters (for method = 'trex+GVS').

lambda_2_lars

lambda_2-value for LARS-based Elastic Net.

rho_thr_DA

Correlation threshold for the T-Rex+DA+AR1 selector and the T-Rex+DA+equi selector (i.e., method = 'trex+DA+AR1' or 'trex+DA+equi').

hc_dist

Distance measure of the hierarchical clustering/dendrogram (only for trex+DA+BT): 'single' for single-linkage, "complete" for complete linkage, "average" for average linkage (see hclust for more options).

hc_grid_length

Length of the height-cutoff-grid for the dendrogram (integer between 1 and the number of original variables p).

parallel_process

Logical. If TRUE random experiments are executed in parallel.

parallel_max_cores

Maximum number of cores to be used for parallel processing.

seed

Seed for random number generator (ignored if parallel_process = FALSE).

eps

Numerical zero.

verbose

Logical. If TRUE progress in computations is shown.

Value

A list containing the estimated support vector and additional information, including the number of used dummies and the number of included dummies before stopping.

Examples

data("Gauss_data")
X <- Gauss_data$X
y <- c(Gauss_data$y)
set.seed(1234)
res <- trex(X = X, y = y)
selected_var <- res$selected_var
selected_var

[Package TRexSelector version 1.0.0 Index]