misvm {mildsvm} | R Documentation |
Fit MI-SVM model to the data
Description
This function fits the MI-SVM model, first proposed by Andrews et al. (2003). It is a variation on the traditional SVM framework that carefully treats data from the multiple instance learning paradigm, where instances are grouped into bags, and a label is only available for each bag.
Usage
## Default S3 method:
misvm(
x,
y,
bags,
cost = 1,
method = c("heuristic", "mip", "qp-heuristic"),
weights = TRUE,
control = list(kernel = "linear", sigma = if (is.vector(x)) 1 else 1/ncol(x),
nystrom_args = list(m = nrow(x), r = nrow(x), sampling = "random"), max_step = 500,
type = "C-classification", scale = TRUE, verbose = FALSE, time_limit = 60, start =
FALSE),
...
)
## S3 method for class 'formula'
misvm(formula, data, ...)
## S3 method for class 'mi_df'
misvm(x, ...)
## S3 method for class 'mild_df'
misvm(x, .fns = list(mean = mean, sd = stats::sd), cor = FALSE, ...)
Arguments
x |
A data.frame, matrix, or similar object of covariates, where each
row represents an instance. If a |
y |
A numeric, character, or factor vector of bag labels for each
instance. Must satisfy |
bags |
A vector specifying which instance belongs to each bag. Can be a string, numeric, of factor. |
cost |
The cost parameter in SVM. If |
method |
The algorithm to use in fitting (default |
weights |
named vector, or |
control |
list of additional parameters passed to the method that control computation with the following components:
|
... |
Arguments passed to or from other methods. |
formula |
a formula with specification |
data |
If |
.fns |
(argument for |
cor |
(argument for |
Details
Several choices of fitting algorithm are available, including a version of the heuristic algorithm proposed by Andrews et al. (2003) and a novel algorithm that explicitly solves the mixed-integer programming (MIP) problem using the gurobi package optimization back-end.
Value
An object of class misvm.
The object contains at least the
following components:
-
*_fit
: A fit object depending on themethod
parameter. Ifmethod = 'heuristic'
, this will be ansvm
fit from the e1071 package. Ifmethod = 'mip', 'qp-heuristic'
this will begurobi_fit
from a model optimization. -
call_type
: A character indicating which methodmisvm()
was called with. -
features
: The names of features used in training. -
levels
: The levels ofy
that are recorded for future prediction. -
cost
: The cost parameter from function inputs. -
weights
: The calculated weights on thecost
parameter. -
repr_inst
: The instances from positive bags that are selected to be most representative of the positive instances. -
n_step
: Ifmethod %in% c('heuristic', 'qp-heuristic')
, the total steps used in the heuristic algorithm. -
x_scale
: Ifscale = TRUE
, the scaling parameters for new predictions.
Methods (by class)
-
default
: Method for data.frame-like objects -
formula
: Method for passing formula -
mi_df
: Method formi_df
objects, automatically handling bag names, labels, and all covariates. -
mild_df
: Method formild_df
objects. Summarize samples to the instance level based on specified functions, then performmisvm()
on instance level data.
Author(s)
Sean Kent, Yifei Liu
References
Andrews, S., Tsochantaridis, I., & Hofmann, T. (2002). Support vector machines for multiple-instance learning. Advances in neural information processing systems, 15.
Kent, S., & Yu, M. (2022). Non-convex SVM for cancer diagnosis based on morphologic features of tumor microenvironment arXiv preprint arXiv:2206.14704
See Also
-
predict.misvm()
for prediction on new data. -
cv_misvm()
for cross-validation fitting.
Examples
set.seed(8)
mil_data <- generate_mild_df(nbag = 20,
positive_prob = 0.15,
sd_of_mean = rep(0.1, 3))
df <- build_instance_feature(mil_data, seq(0.05, 0.95, length.out = 10))
# Heuristic method
mdl1 <- misvm(x = df[, 4:123], y = df$bag_label,
bags = df$bag_name, method = "heuristic")
mdl2 <- misvm(mi(bag_label, bag_name) ~ X1_mean + X2_mean + X3_mean, data = df)
# MIP method
if (require(gurobi)) {
mdl3 <- misvm(x = df[, 4:123], y = df$bag_label,
bags = df$bag_name, method = "mip")
}
predict(mdl1, new_data = df, type = "raw", layer = "bag")
# summarize predictions at the bag layer
library(dplyr)
df %>%
bind_cols(predict(mdl2, df, type = "class")) %>%
bind_cols(predict(mdl2, df, type = "raw")) %>%
distinct(bag_name, bag_label, .pred_class, .pred)