R: Subject Weighted Support Vector Machines

wsvm {WeightSVM}

R Documentation

Subject Weighted Support Vector Machines

Description

wsvm is used to train a subject weighted support vector machine. It can be used to carry out general regression and classification (of nu and epsilon-type), as well as density-estimation. A formula interface is provided.

Usage

## S3 method for class 'formula'
wsvm(formula, weight, data = NULL, ..., subset, na.action =
na.omit, scale = TRUE)
## Default S3 method:
wsvm(x, y = NULL, weight, scale = TRUE, type = NULL, kernel =
"radial", degree = 3, gamma = if (is.vector(x)) 1 else 1 / ncol(x),
coef0 = 0, cost = 1, nu = 0.5,
class.weights = NULL, cachesize = 100, tolerance = 0.001, epsilon = 0.1,
shrinking = TRUE, cross = 0, probability = FALSE, fitted = TRUE,
..., subset, na.action = na.omit)

Arguments

`formula`	a symbolic description of the model to be fit.
`data`	an optional data frame containing the variables in the model. By default the variables are taken from the environment which ‘wsvm’ is called from.
`x`	a data matrix, a vector, or a sparse 'design matrix' (object of class `Matrix` provided by the Matrix package, or of class `matrix.csr` provided by the SparseM package, or of class `simple_triplet_matrix` provided by the slam package). Or a kernel matrix of class `kernelMatrix` by the kernlab package.
`y`	a response vector with one label for each row/component of `x`. Can be either a factor (for classification tasks) or a numeric vector (for regression).
`weight`	the weight of each subject. It should be in the same length of `y`.
`scale`	A logical vector indicating the variables to be scaled. If `scale` is of length 1, the value is recycled as many times as needed. By default, data are scaled internally (both `x` and `y` variables) to zero mean and unit variance. The center and scale values are returned and used for later predictions. If x is a design matrix which contains dummy variables, please make these variable NOT scaled.
`type`	`wsvm` can be used as a classification machine, as a regression machine, or for novelty detection. Depending of whether `y` is a factor or not, the default setting for `type` is `C-classification` or `eps-regression`, respectively, but may be overwritten by setting an explicit value. Valid options are: `C-classification` `nu-classification` `one-classification` (for novelty detection) `eps-regression` `nu-regression`
`kernel`	the kernel used in training and predicting. You might consider changing some of the following parameters, depending on the kernel type. linear: `u'v` polynomial: `(\gamma u'v + coef0)^{degree}` radial basis: `e^(-\gamma \|u-v\|^2)` sigmoid: `tanh(\gamma u'v + coef0)` precomputed: x is a precomputed kernel matrix that contains NO missing values. `scale` will not work. Cannot use `subset` and `na.action` with this kernel.
`degree`	parameter needed for kernel of type `polynomial` (default: 3)
`gamma`	parameter needed for all kernels except `linear` (default: 1/(data dimension))
`coef0`	parameter needed for kernels of type `polynomial` and `sigmoid` (default: 0)
`cost`	cost of constraints violation (default: 1)—it is the ‘C’-constant of the regularization term in the Lagrange formulation.
`nu`	parameter needed for `nu-classification`, `nu-regression`, and `one-classification`
`class.weights`	a named vector of weights for the different classes, used for asymmetric class sizes. Not all factor levels have to be supplied (default weight: 1). All components have to be named. Specifying `"inverse"` will choose the weights inversely proportional to the class distribution.
`cachesize`	cache memory in MB (default 100)
`tolerance`	tolerance of termination criterion (default: 0.001)
`epsilon`	epsilon in the insensitive-loss function (default: 0.1)
`shrinking`	option whether to use the shrinking-heuristics (default: `TRUE`)
`cross`	if a integer value k>0 is specified, a k-fold cross validation on the training data is performed to assess the quality of the model: the accuracy rate for classification and the Mean Squared Error for regression. Note the result is not weighted. For weighted results, use `tune_wsvm` fucntion.
`fitted`	logical indicating whether the fitted values should be computed and included in the model or not (default: `TRUE`)
`probability`	logical indicating whether the model should allow for probability predictions.
`...`	additional parameters for the low level fitting function `wsvm.default`
`subset`	An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)
`na.action`	A function to specify the action to be taken if `NA`s are found. The default action is `na.omit`, which leads to rejection of cases with missing values on any required variable. An alternative is `na.fail`, which causes an error if `NA` cases are found. (NOTE: If given, this argument must be named.)

Details

The original libsvm does not support subject/instance weighted svm. From the 'LIBSVM Tools' https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances, we are able to use a modified version of libsvm to support subject weights.

For multiclass-classification with k levels, k>2, libsvm uses the ‘one-against-one’-approach, in which k(k-1)/2 binary classifiers are trained; the appropriate class is found by a voting scheme.

libsvm internally uses a sparse data representation, which is also high-level supported by the package SparseM.

If the predictor variables include factors, the formula interface must be used to get a correct model matrix or make x a design matrix.

When using the formula interface and na.action is na.omit, we delete any subjects with missing values on x, y (if exists) or weight in the training and predicting procedure (when fitted = TRUE). When using the x, y interface and na.action is na.omit, we delete any subjects with missing values on x, y (if exists) or weight in the training procedure, and retain the subjects with missing values only on weight in the predicting procedure (when fitted = TRUE).

plot.wsvm allows a simple graphical visualization of classification models.

The probability model for classification fits a logistic distribution using maximum likelihood to the decision values of all binary classifiers, and computes the a-posteriori class probabilities for the multi-class problem using quadratic optimization. The probabilistic regression model assumes (zero-mean) laplace-distributed errors for the predictions, and estimates the scale parameter using maximum likelihood.

For linear kernel, the coefficients of the regression/decision hyperplane can be extracted using the coef method (see examples).

Value

An object of class "wsvm" containing the fitted model, including:

`SV`	The resulting support vectors (possibly scaled).
`index`	The index of the resulting support vectors in the data matrix. Note that this index refers to the preprocessed data (after the possible effect of `na.omit` and `subset`)
`coefs`	The corresponding coefficients times the training labels.
`rho`	The negative intercept.
`sigma`	In case of a probabilistic regression model, the scale parameter of the hypothesized (zero-mean) laplace distribution estimated by maximum likelihood.
`probA`, `probB`	numeric vectors of length k(k-1)/2, k number of classes, containing the parameters of the logistic distributions fitted to the decision values of the binary classifiers (1 / (1 + exp(a x + b))).

Note

Data are scaled internally, usually yielding better results.

Parameters of SVM-models usually must be tuned to yield sensible results!

Author(s)

David Meyer (based on C/C++-code by Chih-Chung Chang and Chih-Jen Lin)
Modified by Tianchen Xu tx2155@columbia.edu

References

Chang, Chih-Chung and Lin, Chih-Jen:
LIBSVM: a library for Support Vector Machines
https://www.csie.ntu.edu.tw/~cjlin/libsvm/
Ming-Wei Chang, Hsuan-Tien Lin, Ming-Hen Tsai, Chia-Hua Ho and Hsiang-Fu Yu
Weights for data instances
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances
Exact formulations of models, algorithms, etc. can be found in the document:
Chang, Chih-Chung and Lin, Chih-Jen:
LIBSVM: a library for Support Vector Machines
https://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.ps.gz
More implementation details and speed benchmarks can be found on: Rong-En Fan and Pai-Hsune Chen and Chih-Jen Lin:
Working Set Selection Using the Second Order Information for Training SVM
https://www.csie.ntu.edu.tw/~cjlin/papers/quadworkset.pdf

Examples

## check what is loaded
dllpath <- getLoadedDLLs()
getDLLRegisteredRoutines(dllpath$WeightSVM[[2]])

## load dataset
data(iris)

## classification mode
# default with factor response:
model1 <- wsvm(Species ~ ., weight = rep(1,150), data = iris) # same weights
model2 <- wsvm(x = iris[,1:4], y = iris[,5],
              weight = c(rep(0.08, 50),rep(1,100))) # less weights to setosa
# alternatively the traditional interface:
x <- subset(iris, select = -Species)
y <- iris$Species
model3 <- wsvm(x, y, weight = rep(10,150)) # similar to model 1,
                               # but larger weights for all subjects

# These models provide error/warning info
try(wsvm(x, y)) # no weight
try(wsvm(x, y, weight = rep(10,100))) # wrong length
try(wsvm(x, y, weight = c(Inf, rep(1,149)))) # contains inf weight

print(model1)
summary(model1)

# test with train data
pred <- predict(model1, iris[,1:4])
# (same as:)
pred <- fitted(model1)

# Check accuracy:
table(pred, y) # model 1, equal weights

# compute decision values and probabilities:
pred <- predict(model1, x, decision.values = TRUE)
attr(pred, "decision.values")[1:4,]

# visualize (classes by color, SV by crosses):
plot(cmdscale(dist(iris[,-5])),
     col = as.integer(iris[,5]),
     pch = c("o","+")[1:150 %in% model1$index + 1]) # model 1
plot(cmdscale(dist(iris[,-5])),
     col = as.integer(iris[,5]),
     pch = c("o","+")[1:150 %in% model2$index + 1])
  # In model 2, less support vectors are based on setosa


## try regression mode on two dimensions
# create data
x <- seq(0.1, 5, by = 0.05)
y <- log(x) + rnorm(x, sd = 0.2)

# estimate model and predict input values
model1 <- wsvm(x, y, weight = rep(1,99))
model2 <- wsvm(x, y, weight = seq(99,1,length.out = 99)) # decreasing weights
# library(kernlab)
# model3 <- wsvm(kernlab::kernelMatrix(kernlab::rbfdot(sigma = 1), x), y,
#      weight = rep(1,99), kernel = 'precomputed') # try user defined kernel

# visualize
plot(x, y)
lines(x, log(x), col = 2)
points(x, fitted(model1), col = 4)
points(x, fitted(model2), col = 3) # better fit for the first few points
# points(x, fitted(model3), col = 5) # similar to model 1 with user defined kernel

## density-estimation
# create 2-dim. normal with rho=0:
X <- data.frame(a = rnorm(1000), b = rnorm(1000))
attach(X)

# formula interface:
model <- wsvm(~ a + b, gamma = 0.1, weight = c(seq(5000,1,length.out = 500),1:500))

# test:
newdata <- data.frame(a = c(0, 4), b = c(0, 4))

# visualize:
plot(X, col = 1:1000 %in% model$index + 1, xlim = c(-5,5), ylim=c(-5,5))
points(newdata, pch = "+", col = 2, cex = 5)

## class weights:
i2 <- iris
levels(i2$Species)[3] <- "versicolor"
summary(i2$Species)
wts <- 100 / table(i2$Species)
wts
m <- wsvm(Species ~ ., data = i2, class.weights = wts, weight=rep(1,150))

## extract coefficients for linear kernel

# a. regression
x <- 1:100
y <- x + rnorm(100)
m <- wsvm(y ~ x, scale = FALSE, kernel = "linear", weight = rep(1,100))
coef(m)
plot(y ~ x)
abline(m, col = "red")

# b. classification
# transform iris data to binary problem, and scale data
setosa <- as.factor(iris$Species == "setosa")
iris2 = scale(iris[,-5])

# fit binary C-classification model
model1 <- wsvm(setosa ~ Petal.Width + Petal.Length,
          data = iris2, kernel = "linear", weight = rep(1,150))
model2 <- wsvm(setosa ~ Petal.Width + Petal.Length,
               data = iris2, kernel = "linear",
               weight = c(rep(0.08, 50),rep(1,100))) # less weights to setosa

# plot data and separating hyperplane
plot(Petal.Length ~ Petal.Width, data = iris2, col = setosa)
(cf <- coef(model1))
abline(-cf[1]/cf[3], -cf[2]/cf[3], col = "red")
(cf2 <- coef(model2))
abline(-cf2[1]/cf2[3], -cf2[2]/cf2[3], col = "red", lty = 2)

# plot margin and mark support vectors
abline(-(cf[1] + 1)/cf[3], -cf[2]/cf[3], col = "blue")
abline(-(cf[1] - 1)/cf[3], -cf[2]/cf[3], col = "blue")
points(model1$SV, pch = 5, cex = 2)
abline(-(cf2[1] + 1)/cf2[3], -cf2[2]/cf2[3], col = "blue", lty = 2)
abline(-(cf2[1] - 1)/cf2[3], -cf2[2]/cf2[3], col = "blue", lty = 2)
points(model2$SV, pch = 6, cex = 2)

[Package WeightSVM version 1.7-13 Index]