R: Parameter Tuning of SVM

svm_tune {LncFinder}

R Documentation

Parameter Tuning of SVM

Description

This function conduct the parameter tuning of SVM. Parameters gamma and cost can be tuned using grid search.

Usage

svm_tune(
  dataset,
  label.col = 1,
  positive.class = "NonCoding",
  folds.num = 10,
  seed = 1,
  gamma.range = (2^seq(-5, 0, 1)),
  cost.range = c(1, 4, 8, 16, 24, 32),
  return.model = TRUE,
  parallel.cores = 2,
  ...
)

Arguments

`dataset`	The dataset obtained from function `extract_features`. Or datasets used to build the classifier.
`label.col`	integer specifying the column number of the label. (Default: `1`)
`positive.class`	Character. Indicate the positive class of the dataset. (Default: `NonCoding`) The value of this parameter should be identical to one of the classes of the response vectors.
`folds.num`	Integer. Specify the number of folds for cross-validation. (Default: `10`)
`seed`	Integer. Used to set the seed for cross-validation. (Default: `1`)
`gamma.range`	The range of gamma. (Default: `2 ^ seq(-5, 0, 1)`)
`cost.range`	The range of cost. (Default: `c(1, 4, 8, 16, 24, 32)`)
`return.model`	Logical. If `TRUE`, the function will return the best model trained on the full dataset. If `FALSE`, this function will return the optimal parameters.
`parallel.cores`	Integer. The number of cores for parallel computation. By default the number of cores is `2`, users can set as `-1` to run this function with all cores. If the number of `parallel.cores` is more than the `folds.num` (number of the folds for cross-validation), the number of `parallel.cores` will be set as `folds.num` automatically.
`...`	Additional arguments for function `svm`, except `scale`, `probability`, `kernel`, `gamma` and `cost`.

Details

During the model tuning, the performance of each combination of parameters will output. Sensitivity, Specificity, Accuracy, F-Measure and Kappa Value are used to evaluate the performances. The best gamma and cost (or best model) are selected based on Accuracy.

For the details of parameter gamma and cost, please refer to function svm of package "e1071".

For the details of metrics, please refer to function confusionMatrix of package "caret".

Value

Returns the optimal parameters when return.model = FALSE. Or returns the best model when return.model = TRUE.

Author(s)

HAN Siyu

Examples

## Not run: 
data(demo_DNA.seq)
Seqs <- demo_DNA.seq

positive_data <- extract_features(Seqs[1:5], label = "NonCoding",
                                  SS.features = FALSE, format = "DNA",
                                  frequencies.file = "human",
                                  parallel.cores = 2)

negative_data <- extract_features(Seqs[6:10], label = "Coding",
                                  SS.features = FALSE, format = "DNA",
                                  frequencies.file = "human",
                                  parallel.cores = 2)

my_dataset <- rbind(positive_data, negative_data)

### Or use our data "demo_dataset"
data(demo_dataset)
my_dataset <- demo_dataset

optimal_parameter <- svm_tune(my_dataset, positive.class = "NonCoding",
                              folds.num = 2, seed = 1,
                              gamma.range = (2 ^ seq(-5, 0, 2)),
                              cost.range = c(1, 8, 16),
                              return.model = FALSE, parallel.core = 2)

### Users can set return.model = TRUE to return the best model.

## End(Not run)

[Package LncFinder version 1.1.5 Index]