grid_search {abcrlda}R Documentation

Grid Search

Description

Performs grid search to estimate the optimal hyperparameters (gamma and cost) within specified space based on double asymptotic risk estimation or cross validation. Double asymptotic risk estimation is more efficient to compute because it uses closed form for risk estimation. For further details, refer to the article in the reference section.

\Re = \varepsilon_0 * C_{10} + \varepsilon_1 * C_{01}

\varepsilon_i = \Phi(\frac{(-1)^{i+1} ( \hat{G}_i + \hat{\omega}_{opt}/\gamma )}{\sqrt{\hat{D}}})

Separate sampling cross-validation (see cross-validation function) was adapted to work with cost-based risk estimation.

Usage

grid_search(
  x,
  y,
  range_gamma,
  range_cost,
  method = "estimator",
  nfolds = 10,
  bias_correction = TRUE
)

Arguments

x

Input matrix or data.frame of dimension nobs x nvars; each row is an feature vector.

y

A numeric vector or factor of class labels. Factor should have either two levels or be a vector with two distinct values. If y is presented as a vector, it will be coerced into a factor. Length of y has to correspond to number of samples in x.

range_gamma

Vector of gamma values to check.

range_cost

nobs x 1 vector (values should be between 0 and 1) or nobs x 2 matrix (each row is cost pair value c(C_{10}, C_{01})) of cost values to check.

method

Selects method to evaluete risk. "estimator" and "cross".

nfolds

Number of folds to use with cross-validation. Default is 10. In case of imbalanced data, nfolds should not be greater than the number of observations in smaller class.

bias_correction

Takes in a boolean value. If bias_correction is TRUE, then asymptotic bias correction will be performed. Otherwise, (if bias_correction is FALSE) asymptotic bias correction will not be performed and the ABCRLDA is the classical RLDA. The default is TRUE.

Value

List of estimated parameters.

cost

Cost value for which risk estimates are lowest during the search.

gamma

Gamma regularization parameter for which risk estimates are lowest during the search.

risk

Lowest risk value estimated during grid search.

Reference

A. Zollanvari, M. Abdirash, A. Dadlani and B. Abibullaev, "Asymptotically Bias-Corrected Regularized Linear Discriminant Analysis for Cost-Sensitive Binary Classification," in IEEE Signal Processing Letters, vol. 26, no. 9, pp. 1300-1304, Sept. 2019. doi: 10.1109/LSP.2019.2918485 URL: https://ieeexplore.ieee.org/document/8720003

Braga-Neto, Ulisses & Zollanvari, Amin & Dougherty, Edward. (2014). Cross-Validation Under Separate Sampling: Strong Bias and How to Correct It. Bioinformatics (Oxford, England). 30. 10.1093/bioinformatics/btu527. URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4296143/pdf/btu527.pdf

See Also

Other functions in the package: abcrlda(), cross_validation(), da_risk_estimator(), predict.abcrlda(), risk_calculate()

Examples

data(iris)
train_data <- iris[which(iris[, ncol(iris)] == "virginica" |
                         iris[, ncol(iris)] == "versicolor"), 1:4]
train_label <- factor(iris[which(iris[, ncol(iris)] == "virginica" |
                                 iris[, ncol(iris)] == "versicolor"), 5])
cost_range <- seq(0.1, 0.9, by = 0.2)
gamma_range <- c(0.1, 1, 10, 100, 1000)

gs <- grid_search(train_data, train_label,
                  range_gamma = gamma_range,
                  range_cost = cost_range,
                  method = "estimator")
model <- abcrlda(train_data, train_label,
                 gamma = gs$gamma, cost = gs$cost)
predict(model, train_data)

cost_range <- matrix(1:10, ncol = 2)
gamma_range <- c(0.1, 1, 10, 100, 1000)

gs <- grid_search(train_data, train_label,
                  range_gamma = gamma_range,
                  range_cost = cost_range,
                  method = "cross")
model <- abcrlda(train_data, train_label,
                 gamma = gs$gamma, cost = gs$cost)
predict(model, train_data)

[Package abcrlda version 1.0.3 Index]