R: Cross Validation for separate sampling adjusted for cost.

cross_validation {abcrlda}

R Documentation

Cross Validation for separate sampling adjusted for cost.

Description

This function implements Cross Validation for separate sampling adjusted for cost.

Usage

cross_validation(
  x,
  y,
  gamma = 1,
  cost = c(0.5, 0.5),
  nfolds = 10,
  bias_correction = TRUE
)

Arguments

`x`	Input matrix or data.frame of dimension `nobs x nvars`; each row is an feature vector.
`y`	A numeric vector or factor of class labels. Factor should have either two levels or be a vector with two distinct values. If `y` is presented as a vector, it will be coerced into a factor. Length of `y` has to correspond to number of samples in `x`.
`gamma`	Regularization parameter `\gamma` in the ABC-RLDA discriminant function given by: `W_{ABC}^{RLDA} = \gamma (x-\frac{\bar{x}_0 + \bar{x}_1}{2})^T H (\bar{x}_0 - \bar{x}_1) - log(\frac{C_{01}}{C_{10}}) + \hat{\omega}_{opt}` `H = (I_p + \gamma \hat{\Sigma})^{-1}` Formulas and derivations for parameters used in above equation can be found in the article under reference section.
`cost`	Parameter that controls the overall misclassification costs. This is a vector of length 1 or 2 where the first value is `C_{10}` (represents the cost of assigning label 1 when the true label is 0) and the second value, if provided, is `C_{01}` (represents the cost of assigning label 0 when the true label is 1). The default setting is c(0.5, 0.5), so both classes have equal misclassification costs If a single value is provided, it should be normalized to lie between 0 and 1 (but not including 0 or 1). This value will be assigned to `C_{10}` while `C_{01}` will be equal to `(1 - C_{10})`.
`nfolds`	Number of folds to use with cross-validation. Default is 10. In case of imbalanced data, `nfolds` should not be greater than the number of observations in smaller class.
`bias_correction`	Takes in a boolean value. If `bias_correction` is TRUE, then asymptotic bias correction will be performed. Otherwise, (if `bias_correction` is FALSE) asymptotic bias correction will not be performed and the ABCRLDA is the classical RLDA. The default is TRUE.

Value

Returns list of parameters.

`risk_cross`	Returns risk estimation where `\Re = \varepsilon_0 * C_{10} + \varepsilon_1 * C_{01}`
`e_0`	Error estimate for class 0.
`e_1`	Error estimate for class 1.

Reference

Braga-Neto, Ulisses & Zollanvari, Amin & Dougherty, Edward. (2014). Cross-Validation Under Separate Sampling: Strong Bias and How to Correct It. Bioinformatics (Oxford, England). 30. 10.1093/bioinformatics/btu527. URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4296143/pdf/btu527.pdf

Examples

data(iris)
train_data <- iris[which(iris[, ncol(iris)] == "virginica" |
                         iris[, ncol(iris)] == "versicolor"), 1:4]
train_label <- factor(iris[which(iris[, ncol(iris)] == "virginica" |
                                 iris[, ncol(iris)] == "versicolor"), 5])
cross_validation(train_data, train_label, gamma = 10)

[Package abcrlda version 1.0.3 Index]