SL.ranger.imp {flevr} | R Documentation |
Super Learner wrapper for a ranger object with variable importance
Description
Super Learner wrapper for a ranger object with variable importance
Usage
SL.ranger.imp(
Y,
X,
newX,
family,
obsWeights = rep(1, length(Y)),
num.trees = 500,
mtry = floor(sqrt(ncol(X))),
write.forest = TRUE,
probability = family$family == "binomial",
min.node.size = ifelse(family$family == "gaussian", 5, 1),
replace = TRUE,
sample.fraction = ifelse(replace, 1, 0.632),
num.threads = 1,
verbose = FALSE,
importance = "impurity",
...
)
Arguments
Y |
Outcome variable |
X |
Training dataframe |
newX |
Test dataframe |
family |
Gaussian or binomial |
obsWeights |
Observation-level weights |
num.trees |
Number of trees. |
mtry |
Number of variables to possibly split at in each node. Default is the (rounded down) square root of the number variables. |
write.forest |
Save ranger.forest object, required for prediction. Set to FALSE to reduce memory usage if no prediction intended. |
probability |
Grow a probability forest as in Malley et al. (2012). |
min.node.size |
Minimal node size. Default 1 for classification, 5 for regression, 3 for survival, and 10 for probability. |
replace |
Sample with replacement. |
sample.fraction |
Fraction of observations to sample. Default is 1 for sampling with replacement and 0.632 for sampling without replacement. |
num.threads |
Number of threads to use. |
verbose |
If TRUE, display additional output during execution. |
importance |
Variable importance mode, one of 'none', 'impurity', 'impurity_corrected', 'permutation'. The 'impurity' measure is the Gini index for classification, the variance of the responses for regression and the sum of test statistics (see |
... |
Any additional arguments, not currently used. |
Value
a named list with elements pred
(predictions on newX
) and fit
(the fitted ranger
object).
References
Breiman, L. (2001). Random forests. Machine learning 45:5-32.
Wright, M. N. & Ziegler, A. (2016). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, in press. http://arxiv.org/abs/1508.04409.
See Also
SL.ranger
ranger
predict.ranger
Examples
data("biomarkers")
# subset to complete cases for illustration
cc <- complete.cases(biomarkers)
dat_cc <- biomarkers[cc, ]
# use only the mucinous outcome, not the high-malignancy outcome
y <- dat_cc$mucinous
x <- dat_cc[, !(names(dat_cc) %in% c("mucinous", "high_malignancy"))]
feature_nms <- names(x)
# get the fit
set.seed(20231129)
fit <- SL.ranger.imp(Y = y, X = x, newX = x, family = binomial())
fit