R: Grid Search

grid_search {abcrlda}

R Documentation

Grid Search

Description

Performs grid search to estimate the optimal hyperparameters (gamma and cost) within specified space based on double asymptotic risk estimation or cross validation. Double asymptotic risk estimation is more efficient to compute because it uses closed form for risk estimation. For further details, refer to the article in the reference section.

\Re = \varepsilon_0 * C_{10} + \varepsilon_1 * C_{01}

\varepsilon_i = \Phi(\frac{(-1)^{i+1} ( \hat{G}_i + \hat{\omega}_{opt}/\gamma )}{\sqrt{\hat{D}}})

Separate sampling cross-validation (see cross-validation function) was adapted to work with cost-based risk estimation.

Usage

grid_search(
  x,
  y,
  range_gamma,
  range_cost,
  method = "estimator",
  nfolds = 10,
  bias_correction = TRUE
)

Arguments

`x`	Input matrix or data.frame of dimension `nobs x nvars`; each row is an feature vector.
`y`	A numeric vector or factor of class labels. Factor should have either two levels or be a vector with two distinct values. If `y` is presented as a vector, it will be coerced into a factor. Length of `y` has to correspond to number of samples in `x`.
`range_gamma`	Vector of `gamma` values to check.
`range_cost`	nobs x 1 vector (values should be between 0 and 1) or nobs x 2 matrix (each row is cost pair value c(`C_{10}`, `C_{01}`)) of cost values to check.
`method`	Selects method to evaluete risk. "estimator" and "cross".
`nfolds`	Number of folds to use with cross-validation. Default is 10. In case of imbalanced data, `nfolds` should not be greater than the number of observations in smaller class.
`bias_correction`	Takes in a boolean value. If `bias_correction` is TRUE, then asymptotic bias correction will be performed. Otherwise, (if `bias_correction` is FALSE) asymptotic bias correction will not be performed and the ABCRLDA is the classical RLDA. The default is TRUE.

Value

List of estimated parameters.

`cost`	Cost value for which risk estimates are lowest during the search.
`gamma`	Gamma regularization parameter for which risk estimates are lowest during the search.
`risk`	Lowest risk value estimated during grid search.

Reference

A. Zollanvari, M. Abdirash, A. Dadlani and B. Abibullaev, "Asymptotically Bias-Corrected Regularized Linear Discriminant Analysis for Cost-Sensitive Binary Classification," in IEEE Signal Processing Letters, vol. 26, no. 9, pp. 1300-1304, Sept. 2019. doi: 10.1109/LSP.2019.2918485 URL: https://ieeexplore.ieee.org/document/8720003

Braga-Neto, Ulisses & Zollanvari, Amin & Dougherty, Edward. (2014). Cross-Validation Under Separate Sampling: Strong Bias and How to Correct It. Bioinformatics (Oxford, England). 30. 10.1093/bioinformatics/btu527. URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4296143/pdf/btu527.pdf

Examples

data(iris)
train_data <- iris[which(iris[, ncol(iris)] == "virginica" |
                         iris[, ncol(iris)] == "versicolor"), 1:4]
train_label <- factor(iris[which(iris[, ncol(iris)] == "virginica" |
                                 iris[, ncol(iris)] == "versicolor"), 5])
cost_range <- seq(0.1, 0.9, by = 0.2)
gamma_range <- c(0.1, 1, 10, 100, 1000)

gs <- grid_search(train_data, train_label,
                  range_gamma = gamma_range,
                  range_cost = cost_range,
                  method = "estimator")
model <- abcrlda(train_data, train_label,
                 gamma = gs$gamma, cost = gs$cost)
predict(model, train_data)

cost_range <- matrix(1:10, ncol = 2)
gamma_range <- c(0.1, 1, 10, 100, 1000)

gs <- grid_search(train_data, train_label,
                  range_gamma = gamma_range,
                  range_cost = cost_range,
                  method = "cross")
model <- abcrlda(train_data, train_label,
                 gamma = gs$gamma, cost = gs$cost)
predict(model, train_data)

[Package abcrlda version 1.0.3 Index]