sense {sense} | R Documentation |
sense
Description
Stacked ensamble for regression tasks based on 'mlr3' framework.
Usage
sense(
df,
target_feat,
benchmarking = "all",
super = "avg",
algos = c("glmnet", "ranger", "xgboost", "rpart", "kknn", "svm"),
sampling_rate = 1,
metric = "mae",
collapse_char_to = 10,
num_preproc = "scale",
fct_preproc = "one-hot",
impute_num = "sample",
missing_fusion = FALSE,
inner = "holdout",
outer = "holdout",
folds = 3,
repeats = 3,
ratio = 0.5,
selected_filter = "information_gain",
selected_n_feats = NULL,
tuning = "random_search",
budget = 30,
resolution = 5,
n_evals = 30,
minute_time = 10,
patience = 0.3,
min_improve = 0.01,
java_mem = 64,
decimals = 2,
seed = 42
)
Arguments
df |
A data frame with features and target. |
target_feat |
String. Name of the numeric feature for the regression task. |
benchmarking |
Positive integer. Number of base learners to stack. Default: "all". |
super |
String. Super learner of choice among the available learners. Default: "avg". |
algos |
String vector. Available learners are: "glmnet", "ranger", "xgboost", "rpart", "kknn", "svm". |
sampling_rate |
Positive numeric. Sampling rate before applying the stacked ensemble. Default: 1. |
metric |
String. Evaluation metric for outer and inner cross-validation. Default: "mae". |
collapse_char_to |
Positive integer. Conversion of characters to factors with predefined maximum number of levels. Default: 10. |
num_preproc |
String. Options for scalar pre-processing: "scale" or "range". Default: "scale". |
fct_preproc |
String. Options for factor pre-processing: "encodeimpact", "encodelmer", "one-hot", "treatment", "poly", "sum", "helmert". Default: "one-hot". |
impute_num |
String. Options for missing imputation in case of numeric: "sample" or "hist". Default: "sample". For factor the default mode is Out-Of-Range. |
missing_fusion |
String. Adding missing indicator features. Default: "FALSE". |
inner |
String. Cross-validation inner cycle: "holdout", "cv", "repeated_cv", "subsampling". Default: "holdout". |
outer |
String. Cross-validation outer cycle: "holdout", "cv", "repeated_cv", "subsampling". Default: "holdout". |
folds |
Positive integer. Number of repetitions used in "cv" and "repeated_cv". Default: 3. |
repeats |
Positive integer. Number of repetitions used in "subsampling" and "repeated_cv". Default: 3. |
ratio |
Positive numeric. Percentage value for "holdout" and "subsampling". Default: 0.5. |
selected_filter |
String. Filters available for regression tasks: "carscore", "cmim", "correlation", "find_correlation", "information_gain", "relief", "variance". Default: "information_gain". |
selected_n_feats |
Positive integer. Number of features to select through the chosen filter. Default: NULL. |
tuning |
String. Available options are "random_search" and "grid_search". Default: "random_search". |
budget |
Positive integer. Maximum number of trials during random search. Default: 30. |
resolution |
Positive integer. Grid resolution for each hyper-parameter. Default: 5. |
n_evals |
Positive integer. Number of evaluation for termination. Default: 30. |
minute_time |
Positive integer. Maximum run time before termination. Default: 10. |
patience |
Positive numeric. Percentage of stagnating evaluations before termination. Default: 0.3. |
min_improve |
Positive numeric. Minimum error improvement required before termination. Default: 0.01. |
java_mem |
Positive integer. Memory allocated to Java. Default: 64. |
decimals |
Positive integer. Decimal format of prediction. Default: 2. |
seed |
Positive integer. Default: 42. |
Value
This function returns a list including:
benchmark_error: comparison between the base learners
resampled_model: mlr3 standard description of the analytic pipeline.
plot: mlr3 standard graph of the analytic pipeline.
selected_n_feats: selected features and score according to the filtering method used.
model_error: error measure for outer cycle of cross-validation.
testing_frame: data set used for calculating the test metrics.
test_metrics: metrics reported are mse, rmse, mae, mape, mdae, rae, rse, rrse, smape.
model_predict: prediction function to apply to new data on the same scheme.
time_log: computation time.
Author(s)
Giancarlo Vercellino giancarlo.vercellino@gmail.com
See Also
Useful links:
Examples
## Not run:
sense(benchmark, "y", algos = c("glmnet", "rpart"))
## End(Not run)