R: Perform intrinsic, ensemble-based variable selection

intrinsic_selection {flevr}

R Documentation

Perform intrinsic, ensemble-based variable selection

Description

Based on estimated SPVIM values, do variable selection using the specified error-controlling method.

Usage

intrinsic_selection(
  spvim_ests = NULL,
  sample_size = NULL,
  feature_names = "",
  alpha = 0.05,
  control = list(quantity = "gFWER", base_method = "Holm", fdr_method = NULL, q = NULL, k
    = NULL)
)

Arguments

`spvim_ests`	the estimated SPVIM values (an object of class `vim`, resulting from a call to `vimp::sp_vim`). Can also be a list of estimated SPVIMs, if multiple imputation was used to handle missing data; in this case, Rubin's rules will be used to combine the estimated SPVIMs, and then selection will be based on the combined SPVIMs.
`sample_size`	the number of independent observations used to estimate the SPVIM values.
`feature_names`	the names of the features (a character vector of length `p` (the total number of features)); only used if the fitted Super Learner ensemble was fit on a `matrix` rather than on a `data.frame`, `tibble`, etc.
`alpha`	the nominal generalized family-wise error rate, proportion of false positives, or false discovery rate level to control at (e.g., 0.05).
`control`	a list of parameters to control the variable selection process. Parameters include `quantity`, `base_method`, `q`, and `k`. See `intrinsic_control` for details.

Value

a tibble with the estimated intrinsic variable importance, the corresponding variable importance ranks, and the selected variables.

Examples


data("biomarkers")
# subset to complete cases for illustration
cc <- complete.cases(biomarkers)
dat_cc <- biomarkers[cc, ]
# use only the mucinous outcome, not the high-malignancy outcome
y <- dat_cc$mucinous
x <- dat_cc[, !(names(dat_cc) %in% c("mucinous", "high_malignancy"))]
feature_nms <- names(x)
# estimate SPVIMs (using simple library and V = 2 for illustration only)
set.seed(20231129)
library("SuperLearner")
est <- vimp::sp_vim(Y = y, X = x, V = 2, type = "auc", SL.library = "SL.glm", 
                    cvControl = list(V = 2))
# do intrinsic selection
intrinsic_set <- intrinsic_selection(spvim_ests = est, sample_size = nrow(dat_cc), alpha = 0.2, 
                                     feature_names = feature_nms, 
                                     control = list(quantity = "gFWER", base_method = "Holm", 
                                                    k = 1))
intrinsic_set