sim_class {randomMachines} | R Documentation |
Generate a binary classification data set from normal distribution
Description
Simulation used as example of a classification task based on a separation of two
normal multivariate distributions with different vector of means and differerent covariate matrices.
For the label A
the \mathbf{X}_{A}
are sampled from a normal distribution {MVN}\left(\mu_{A}\mathbf{1}_{p},\sigma_{A}^{2}\mathbf{I}_{p}\right)
while for label B
the samples \mathbf{X}_{B}
are from a normal distribution {MVN} \left(\mu_{B}\mathbf{1}_{p},\sigma_{B}^{2}\mathbf{I}_{p}\right)
. For more details see Ara et. al (2021), and Breiman L (1998).
Usage
sim_class(
n,
p = 2,
ratio = 0.5,
mu_a = 0,
sigma_a = 1,
mu_b = 1,
sigma_b = 1
)
Arguments
n |
Sample size |
p |
Number of predictors |
ratio |
Ratio between class A and class B |
mu_a |
Mean of |
sigma_a |
Standard deviation of |
mu_b |
Mean of |
sigma_b |
Standard devation of |
Value
A simulated data.frame with two predictors for a binary classification problem
Author(s)
Mateus Maia: mateusmaia11@gmail.com, Anderson Ara: ara@ufpr.br
References
Ara, Anderson, et al. "Random machines: A bagged-weighted support vector model with free kernel choice." Journal of Data Science 19.3 (2021): 409-428.
Breiman, L. (1998). Arcing classifier (with discussion and a rejoinder by the author). The annals of statistics, 26(3), 801-849.
Examples
library(randomMachines)
sim_data <- sim_class(n = 100)