randomMachines {randomMachines}R Documentation

Random Machines


Random Machines is an ensemble model which uses the combination of different kernel functions to improve the diversity in the bagging approach, improving the predictions in general. Random Machines was developed for classification and regression problems by bagging multiple kernel functions in support vector models.

Random Machines uses SVMs (Cortes and Vapnik, 1995) as base learners in the bagging procedure with a random sample of kernel functions to build them.

Let a training sample given by (xi,yi)(\boldsymbol{x_{i}},y_i) with i=1,,ni=1,\dots, n observations, where xi\boldsymbol{x_{i}} is the vector of independent variables and yiy_{i} the dependent one. The kernel bagging method initializes by training of the rr single learner, where r=1,,Rr=1,\dots,R and RR is the total number of different kernel functions that could be used in support vector models. In this implementation the default value is R=4R=4 (gaussian, polynomial, laplacian and linear). See more details below.

Each single learner is internally validated and the weights λr\lambda_{r} are calculated proportionally to the strength from the single predictive performance.

Afterwards, BB bootstrap samples are sampled from the training set. A support vector machine model gbg_{b} is trained for each bootstrap sample, b=i,,Bb=i,\dots,B and the kernel function that will be used for gbg_{b} will be determined by a random choice with probability λr\lambda_{r}. The final weight wbw_b in the bagging procedure is calculated by out-of-bag samples.

The final model G(xi)G(\boldsymbol{x}_i) for a new xi\boldsymbol{x}_i is given by,

The weights λr\lambda_{r} and wbw_b are different calculated for each task (classification, probabilistic classification and regression). See more details in the references.


     B = 25, cost = 1,
     automatic_tuning = FALSE,
     gamma_rbf = 1,
     gamma_lap = 1,
     degree = 2,
     poly_scale = 1,
     offset = 0,
     gamma_cau = 1,
     d_t = 2,
     kernels = c("rbfdot", "polydot", "laplacedot", "vanilladot"),
     prob_model = TRUE,
     loss_function = RMSE,
     epsilon = 0.1,
     beta = 2



an object of class formula: it should contain a symbolic description of the model to be fitted, indicating the dependent variable and all predictors that should be included.


the training data {(xi,yi)}i=1n\left\{\left( \mathbf{x}_{i},y_{i} \right)\right\}_{i=1}^{n} used to train the model.


the validation data {(xi,yi)}i=1V\left\{\left( \mathbf{x}_{i},y_{i}\right) \right\}_{i=1}^{V} used to calculate probabilities λr\lambda_{r}. If validation = NULL,the validation set is going be selected as 0.25 partition from the training data, and the remaining partition is selected as the new training sample.


number of bootstrap samples. The default value is B=25.


the CC-constant term of the regularization on soft margins at support vector models. The default value is cost=1.


boolean to define if the kernel hyperparameters will be selected using the sigest from the ksvm function. The default value is FALSE.


the hyperparameter γg\gamma_{g} used in the RBF kernel. The default value is gamma_rbf=1.


the hyperparameter γl\gamma_{l} used in the Laplacian kernel. The default value is gamma_lap=1.


the degree used in the Polynomial kernel. The default value is degree=2.


the scale parameter from the Polynomial kernel. The default value is poly_scale=1.


the offset parameter from the Polynomial kernel. The default value is offset=0.


the hyperparameter γc\gamma_{c} used in the Cauchy kernel. The default value is gamma_cau=1.


the dtd_{t}-norm from the t-Student kernel. The default value is d_t=2.


a vector with the name of kernel functions that will be used in the Random Machines model. The default include the kernel functions: c("rbfdot", "polydot", "laplacedot", "vanilladot"). The other kernel functions as "cauchydot" and "tdot" are exclusive to the binary classification setting.


a boolean to define if the algorithm will be using a probabilistic approach to the define the predictions (default = TRUE).


Define which loss function is going to be used in the regression approach. The default is the RMSE function but others can be added.


The epsilon in the loss function used from the SVR implementation. The default value is epsilon=0.1.


The correlation parameter β\beta which calibrates the penalisation of each kernel performance in regression tasks. The default value is beta=2.


The Random Machines is an ensemble method which combines the bagging procedure proposed by Breiman (1996), using Support Vector Machine models as base learners jointly with a random selection of kernel functions that add diversity to the ensemble without harming its predictive performance. The kernel functions k(x,y)k(x,y) are described by the functions below,


randomMachines() returns an object of class "rm_class" for classification tasks or "rm_reg" for if the target variable is a continuous numerical response. See predict.rm_class or predict.rm_reg for more details of how to obtain predictions from each model respectively.


Mateus Maia: mateusmaia11@gmail.com, Gabriel Felipe Ribeiro: brielribeiro08@gmail.com, Anderson Ara: ara@ufpr.br


Ara, Anderson, et al. "Regression random machines: An ensemble support vector regression model with free kernel choice." Expert Systems with Applications 202 (2022): 117107.

Ara, Anderson, et al. "Random machines: A bagged-weighted support vector model with free kernel choice." Journal of Data Science 19.3 (2021): 409-428.

Breiman, L. (1996). Bagging predictors. Machine learning, 24, 123-140.

Cortes, C., and Vapnik, V. (1995). Support-vector networks. Machine learning, 20, 273-297.

Maia, Mateus, Arthur R. Azevedo, and Anderson Ara. "Predictive comparison between random machines and random forests." Journal of Data Science 19.4 (2021): 593-614.



# Simulation from a binary output context
sim_data <- sim_class(n = 75)

## Setting the training and validation set
sim_new <- sim_class(n = 75)

# Modelling Random Machines (probabilistic output)
rm_mod_prob <- randomMachines(y~., train = sim_data)

## Modelling Random Machines (binary class output)
rm_mod_label <- randomMachines(y~., train = sim_data,prob_model = FALSE)

## Predicting for new data
y_hat <- predict(rm_mod_label,sim_new)

[Package randomMachines version 0.1.0 Index]