sim_class {randomMachines}R Documentation

Generate a binary classification data set from normal distribution

Description

Simulation used as example of a classification task based on a separation of two normal multivariate distributions with different vector of means and differerent covariate matrices. For the label A the \mathbf{X}_{A} are sampled from a normal distribution {MVN}\left(\mu_{A}\mathbf{1}_{p},\sigma_{A}^{2}\mathbf{I}_{p}\right) while for label B the samples \mathbf{X}_{B} are from a normal distribution {MVN} \left(\mu_{B}\mathbf{1}_{p},\sigma_{B}^{2}\mathbf{I}_{p}\right). For more details see Ara et. al (2021), and Breiman L (1998).

Usage

sim_class(
  n,
  p = 2,
  ratio = 0.5,
  mu_a = 0,
  sigma_a = 1,
  mu_b = 1,
  sigma_b = 1
)

Arguments

n

Sample size

p

Number of predictors

ratio

Ratio between class A and class B

mu_a

Mean of X_{1}.

sigma_a

Standard deviation of X_{1}.

mu_b

Mean of X_{2}

sigma_b

Standard devation of X_{2}

Value

A simulated data.frame with two predictors for a binary classification problem

Author(s)

Mateus Maia: mateusmaia11@gmail.com, Anderson Ara: ara@ufpr.br

References

Ara, Anderson, et al. "Random machines: A bagged-weighted support vector model with free kernel choice." Journal of Data Science 19.3 (2021): 409-428.

Breiman, L. (1998). Arcing classifier (with discussion and a rejoinder by the author). The annals of statistics, 26(3), 801-849.

Examples

library(randomMachines)
sim_data <- sim_class(n = 100)

[Package randomMachines version 0.1.0 Index]