perform_grid_search {GeneSelectR} | R Documentation |
Perform Grid Search or Random Search for Hyperparameter Tuning
Description
Perform Grid Search or Random Search for Hyperparameter Tuning
Usage
perform_grid_search(
X_train,
y_train,
pipeline,
scoring,
params,
search_type,
n_iter,
njobs,
modules,
random_state
)
Arguments
X_train |
Training data for predictors. |
y_train |
Training data for outcomes. |
pipeline |
A pipeline specifying the steps for feature selection and model training. |
scoring |
A string representing what scoring metric to use for hyperparameter adjustment. Default value is 'accuracy' |
params |
A list of parameters or parameter distributions to search over. |
search_type |
A character string specifying the type of search ('grid' or 'random'). |
n_iter |
The number of parameter settings that are sampled in a random search. |
njobs |
The number of CPU cores to use. |
modules |
A list containing the definitions for the Python modules and submodules. |
random_state |
An integer value setting the random seed for feature selection algorithms and randomized search CV procedure. By default set to NULL to use different random seed every time an algorithm is used. For reproducibility could be fixed, otherwise for an unbiased estimation should be left as NULL. |
Value
Returns a scikit-learn GridSearchCV, RandomizedSearchCV, or BayesSearchCV object, depending on the search_type
specified.
This object includes several attributes useful for analyzing the hyperparameter tuning process:
- @field best_estimator_: The best estimator chosen by the search.
- @field best_score_: The score of the best_estimator on the left-out data.
- @field best_params_: The parameter setting that gave the best results on the hold-out data.
- @field cv_results_: A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame.
- @field scorer_: Scoring method used on the held-out data.
- @field n_splits_: The number of cross-validation splits (folds/iterations).
These attributes provide insights into the model's performance and the effectiveness of the selected hyperparameters.
Examples
required_modules <- c("sklearn", "boruta")
modules_available <- sapply(required_modules, reticulate::py_module_available)
if (all(modules_available)) {
# Assuming X_train, y_train, pipeline, and params are predefined
# Define sklearn modules (assuming 'define_sklearn_modules' is defined)
sklearn_modules <- define_sklearn_modules()
# Perform a grid search
optimal_model <- perform_grid_search(X_train, y_train, pipeline, "accuracy",
params, "grid", NULL, 1, sklearn_modules, NULL)
# Perform a random search
optimal_model_random <- perform_grid_search(X_train, y_train, pipeline, "accuracy",
params, "random", 10, 1, sklearn_modules, 42)
} else {
unavailable_modules <- names(modules_available[!modules_available])
message(paste("Required Python modules not available:",
paste(unavailable_modules, collapse=', '), ". Skipping example."))
}