perform.svm {stylo}R Documentation

Support Vector Machines classifier

Description

A machine-learning supervised classifier; this function is a wrapper for the Support Vector Machines procedure provided by the package e1071.

Usage

perform.svm(training.set, 
            test.set, 
            classes.training.set = NULL, 
            classes.test.set = NULL, 
            no.of.candidates = 3, 
            tune.parameters = FALSE,
            svm.kernel = "linear",
            svm.degree = 3, 
            svm.coef0 = 0, 
            svm.cost = 1) 

Arguments

training.set

a table containing frequencies/counts for several variables – e.g. most frequent words – across a number of text samples (for the training set). Make sure that the rows contain samples, and the columns – variables (words, n-grams, or whatever needs to be analyzed).

test.set

a table containing frequencies/counts for the training set. The variables used (i.e. columns) must match the columns of the training set.

classes.training.set

a vector containing class identifiers for the training set. When missing, the row names of the training set table will be used; the assumed classes are the strings of characters followed by the first underscore. Consider the following examples: c("Sterne_Tristram", "Sterne_Sentimental", "Fielding_Tom", ...), where the classes are the authors' names, and c("M_Joyce_Dubliners", "F_Woolf_Night_and_day", "M_Conrad_Lord_Jim", ...), where the classes are M(ale) and F(emale) according to authors' gender. Note that only the part up to the first underscore in the sample's name will be included in the class label.

classes.test.set

a vector containing class identifiers for the test set. When missing, the row names of the test set table will be used (see above).

no.of.candidates

how many nearest neighbors will be computed for each test sample (default = 3).

tune.parameters

if this argument is used, two parameters, namely gamma and cost, are tuned using a bootstrap procedure, and then used to build a SVM model.

svm.kernel

SVM kernel. Available values: "linear", which is probably the best choice in stylometry, since the number of variables (e.g. MFWs) is many times bigger than the number of classes; "polynomial", and "radial".

svm.degree

parameter needed for kernel of type "polynomial" (default: 3).

svm.coef0

parameter needed for kernel of type "polynomial" (default: 0).

svm.cost

cost of constraints violation (default: 1); it is the C-constant of the regularization term in the Lagrange formulation.

Value

The function returns a vector of "guessed" classes: each test sample is linked with one of the classes represented in the training set. Additionally, final scores and final rankings of candidates are returned as attributes.

Author(s)

Maciej Eder

See Also

perform.delta, perform.nsc, perform.knn, perform.naivebayes

Examples

## Not run: 
perform.svm(training.set, test.set)

## End(Not run)

# classifying the standard 'iris' dataset:
data(iris)
x = subset(iris, select = -Species)
train = rbind(x[1:25,], x[51:75,], x[101:125,])
test = rbind(x[26:50,], x[76:100,], x[126:150,])
train.classes = c(rep("s",25), rep("c",25), rep("v",25))
test.classes = c(rep("s",25), rep("c",25), rep("v",25))

perform.svm(train, test, train.classes, test.classes)

[Package stylo version 0.7.5 Index]