svm_tune {LncFinder}R Documentation

Parameter Tuning of SVM

Description

This function conduct the parameter tuning of SVM. Parameters gamma and cost can be tuned using grid search.

Usage

svm_tune(
  dataset,
  label.col = 1,
  positive.class = "NonCoding",
  folds.num = 10,
  seed = 1,
  gamma.range = (2^seq(-5, 0, 1)),
  cost.range = c(1, 4, 8, 16, 24, 32),
  return.model = TRUE,
  parallel.cores = 2,
  ...
)

Arguments

dataset

The dataset obtained from function extract_features. Or datasets used to build the classifier.

label.col

integer specifying the column number of the label. (Default: 1)

positive.class

Character. Indicate the positive class of the dataset. (Default: NonCoding) The value of this parameter should be identical to one of the classes of the response vectors.

folds.num

Integer. Specify the number of folds for cross-validation. (Default: 10)

seed

Integer. Used to set the seed for cross-validation. (Default: 1)

gamma.range

The range of gamma. (Default: 2 ^ seq(-5, 0, 1))

cost.range

The range of cost. (Default: c(1, 4, 8, 16, 24, 32))

return.model

Logical. If TRUE, the function will return the best model trained on the full dataset. If FALSE, this function will return the optimal parameters.

parallel.cores

Integer. The number of cores for parallel computation. By default the number of cores is 2, users can set as -1 to run this function with all cores. If the number of parallel.cores is more than the folds.num (number of the folds for cross-validation), the number of parallel.cores will be set as folds.num automatically.

...

Additional arguments for function svm, except scale, probability, kernel, gamma and cost.

Details

During the model tuning, the performance of each combination of parameters will output. Sensitivity, Specificity, Accuracy, F-Measure and Kappa Value are used to evaluate the performances. The best gamma and cost (or best model) are selected based on Accuracy.

For the details of parameter gamma and cost, please refer to function svm of package "e1071".

For the details of metrics, please refer to function confusionMatrix of package "caret".

Value

Returns the optimal parameters when return.model = FALSE. Or returns the best model when return.model = TRUE.

Author(s)

HAN Siyu

See Also

extract_features, svm_cv.

Examples

## Not run: 
data(demo_DNA.seq)
Seqs <- demo_DNA.seq

positive_data <- extract_features(Seqs[1:5], label = "NonCoding",
                                  SS.features = FALSE, format = "DNA",
                                  frequencies.file = "human",
                                  parallel.cores = 2)

negative_data <- extract_features(Seqs[6:10], label = "Coding",
                                  SS.features = FALSE, format = "DNA",
                                  frequencies.file = "human",
                                  parallel.cores = 2)

my_dataset <- rbind(positive_data, negative_data)

### Or use our data "demo_dataset"
data(demo_dataset)
my_dataset <- demo_dataset

optimal_parameter <- svm_tune(my_dataset, positive.class = "NonCoding",
                              folds.num = 2, seed = 1,
                              gamma.range = (2 ^ seq(-5, 0, 2)),
                              cost.range = c(1, 8, 16),
                              return.model = FALSE, parallel.core = 2)

### Users can set return.model = TRUE to return the best model.

## End(Not run)

[Package LncFinder version 1.1.5 Index]