perform_grid_search {GeneSelectR}R Documentation

Perform Grid Search or Random Search for Hyperparameter Tuning

Description

Perform Grid Search or Random Search for Hyperparameter Tuning

Usage

perform_grid_search(
  X_train,
  y_train,
  pipeline,
  scoring,
  params,
  search_type,
  n_iter,
  njobs,
  modules,
  random_state
)

Arguments

X_train

Training data for predictors.

y_train

Training data for outcomes.

pipeline

A pipeline specifying the steps for feature selection and model training.

scoring

A string representing what scoring metric to use for hyperparameter adjustment. Default value is 'accuracy'

params

A list of parameters or parameter distributions to search over.

search_type

A character string specifying the type of search ('grid' or 'random').

n_iter

The number of parameter settings that are sampled in a random search.

njobs

The number of CPU cores to use.

modules

A list containing the definitions for the Python modules and submodules.

random_state

An integer value setting the random seed for feature selection algorithms and randomized search CV procedure. By default set to NULL to use different random seed every time an algorithm is used. For reproducibility could be fixed, otherwise for an unbiased estimation should be left as NULL.

Value

Returns a scikit-learn GridSearchCV, RandomizedSearchCV, or BayesSearchCV object, depending on the search_type specified. This object includes several attributes useful for analyzing the hyperparameter tuning process: - @field best_estimator_: The best estimator chosen by the search. - @field best_score_: The score of the best_estimator on the left-out data. - @field best_params_: The parameter setting that gave the best results on the hold-out data. - @field cv_results_: A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame. - @field scorer_: Scoring method used on the held-out data. - @field n_splits_: The number of cross-validation splits (folds/iterations). These attributes provide insights into the model's performance and the effectiveness of the selected hyperparameters.

Examples


required_modules <- c("sklearn", "boruta")
modules_available <- sapply(required_modules, reticulate::py_module_available)

if (all(modules_available)) {
# Assuming X_train, y_train, pipeline, and params are predefined
# Define sklearn modules (assuming 'define_sklearn_modules' is defined)
sklearn_modules <- define_sklearn_modules()

# Perform a grid search
optimal_model <- perform_grid_search(X_train, y_train, pipeline, "accuracy",
                                    params, "grid", NULL, 1, sklearn_modules, NULL)


# Perform a random search
optimal_model_random <- perform_grid_search(X_train, y_train, pipeline, "accuracy",
                                            params, "random", 10, 1, sklearn_modules, 42)

} else {
unavailable_modules <- names(modules_available[!modules_available])
message(paste("Required Python modules not available:",
  paste(unavailable_modules, collapse=', '), ". Skipping example."))
}


[Package GeneSelectR version 1.0.1 Index]