R: GenSVM: A Generalized Multiclass Support Vector Machine

gensvm-package {gensvm}

R Documentation

GenSVM: A Generalized Multiclass Support Vector Machine

Description

The GenSVM classifier is a generalized multiclass support vector machine (SVM). This classifier aims to find decision boundaries that separate the classes with as wide a margin as possible. In GenSVM, the loss functions that measures how misclassifications are counted is very flexible. This allows the user to tune the classifier to the dataset at hand and potentially obtain higher classification accuracy. Moreover, this flexibility means that GenSVM has a number of alternative multiclass SVMs as special cases. One of the other advantages of GenSVM is that it is trained in the primal space, allowing the use of warm starts during optimization. This means that for common tasks such as cross validation or repeated model fitting, GenSVM can be trained very quickly.

Details

This package provides functions for training the GenSVM model either as a separate model or through a cross-validated parameter grid search. In both cases the GenSVM C library is used for speed. Auxiliary functions for evaluating and using the model are also provided.

GenSVM functions

The main GenSVM functions are:

gensvm: Fit a GenSVM model for specific model parameters.
gensvm.grid: Run a cross-validated grid search for GenSVM.

For the GenSVM and GenSVMGrid models the following two functions are available. When applied to a GenSVMGrid object, the function is applied to the best GenSVM model.

plot: Plot the low-dimensional simplex space where the decision boundaries are fixed (for problems with 3 classes).
predict: Predict the class labels of new data using the GenSVM model.

Moreover, for the GenSVM and GenSVMGrid models a coef function is defined:

coef.gensvm: Get the coefficients of the fitted GenSVM model.
coef.gensvm.grid: Get the parameter grid of the GenSVM grid search.

The following utility functions are also included:

gensvm.accuracy: Compute the accuracy score between true and predicted class labels
gensvm.maxabs.scale: Scale each column of the dataset by its maximum absolute value, preserving sparsity and mapping the data to [-1, 1]
gensvm.train.test.split: Split a dataset into a training and testing sample
gensvm.refit: Refit a fitted GenSVM model with slightly different parameters or on a different dataset

Kernels in GenSVM

GenSVM can be used for both linear and nonlinear multiclass support vector machine classification. In general, linear classification will be faster but depending on the dataset higher classification performance can be achieved using a nonlinear kernel.

The following nonlinear kernels are implemented in the GenSVM package:

RBF

The Radial Basis Function kernel is a well-known kernel function based on the Euclidean distance between objects. It is defined as

k(x_i, x_j) = exp( -\gamma || x_i - x_j ||^2 )

Polynomial

A polynomial kernel can also be used in GenSVM. This kernel function is implemented very generally and therefore takes three parameters (coef, gamma, and degree). It is defined as:

k(x_i, x_j) = ( \gamma x_i' x_j + coef)^{degree}

Sigmoid

The sigmoid kernel is the final kernel implemented in GenSVM. This kernel has two parameters and is implemented as follows:

k(x_i, x_j) = \tanh( \gamma x_i' x_j + coef)

Author(s)

Gerrit J.J. van den Burg, Patrick J.F. Groenen
Maintainer: Gerrit J.J. van den Burg <gertjanvandenburg@gmail.com>

References

Van den Burg, G.J.J. and Groenen, P.J.F. (2016). GenSVM: A Generalized Multiclass Support Vector Machine, Journal of Machine Learning Research, 17(225):1–42. URL https://jmlr.org/papers/v17/14-526.html.