extrinsic_selection {flevr} | R Documentation |
Perform extrinsic, ensemble-based variable selection
Description
Based on a fitted Super Learner ensemble, extract extrinsic variable importance estimates, rank them, and do variable selection using the specified rank threshold.
Usage
extrinsic_selection(
fit = NULL,
feature_names = "",
threshold = 20,
import_type = "all",
...
)
Arguments
fit |
the fitted Super Learner ensemble. |
feature_names |
the names of the features (a character vector of
length |
threshold |
the threshold for selection based on ranked variable importance; rank 1 is the most important. Defaults to 20 (though this is arbitrary, and really should be specified for the task at hand). |
import_type |
the type of extrinsic importance (either |
... |
other arguments to pass to algorithm-specific importance extractors. |
Value
a tibble with the estimated extrinsic variable importance, the corresponding variable importance ranks, and the selected variables.
See Also
SuperLearner
for specific usage of
the SuperLearner
function and package.
Examples
data("biomarkers")
# subset to complete cases for illustration
cc <- complete.cases(biomarkers)
dat_cc <- biomarkers[cc, ]
# use only the mucinous outcome, not the high-malignancy outcome
y <- dat_cc$mucinous
x <- dat_cc[, !(names(dat_cc) %in% c("mucinous", "high_malignancy"))]
feature_nms <- names(x)
# get the fit (using a simple library and 2 folds for illustration only)
library("SuperLearner")
set.seed(20231129)
fit <- SuperLearner::SuperLearner(Y = y, X = x, SL.library = c("SL.glm", "SL.mean"),
cvControl = list(V = 2))
# extract importance
importance <- extrinsic_selection(fit = fit, feature_names = feature_nms, threshold = 1.5,
import_type = "all")
importance