R: Symmetric scaled-Gaussian attribute noise

sym_sgau_an {noisemodel}

R Documentation

Symmetric scaled-Gaussian attribute noise

Description

Introduction of Symmetric scaled-Gaussian attribute noise into a classification dataset.

Usage

## Default S3 method:
sym_sgau_an(x, y, level, k = 0.2, sortid = TRUE, ...)

## S3 method for class 'formula'
sym_sgau_an(formula, data, ...)

Arguments

`x`	a data frame of input attributes.
`y`	a factor vector with the output class of each sample.
`level`	a double in [0,1] with the noise level to be introduced.
`k`	a double in [0,1] with the scale used for the standard deviation (default: 0.2).
`sortid`	a logical indicating if the indices must be sorted at the output (default: `TRUE`).
`...`	other options to pass to the function.
`formula`	a formula with the output class and, at least, one input attribute.
`data`	a data frame in which to interpret the variables in the formula.

Details

Symmetric scaled-Gaussian attribute noise corrupts (level·100)% of the values of each attribute in the dataset. In order to corrupt an attribute A, (level·100)% of the samples in the dataset are chosen. Then, their values for A are modified adding a random value that follows a Gaussian distribution of mean = 0 and standard deviation = (max-min)·k·level, being max and min the limits of the attribute domain. For nominal attributes, a random value is chosen.

Value

An object of class ndmodel with elements:

`xnoise`	a data frame with the noisy input attributes.
`ynoise`	a factor vector with the noisy output class.
`numnoise`	an integer vector with the amount of noisy samples per attribute.
`idnoise`	an integer vector list with the indices of noisy samples per attribute.
`numclean`	an integer vector with the amount of clean samples per attribute.
`idclean`	an integer vector list with the indices of clean samples per attribute.
`distr`	an integer vector with the samples per class in the original data.
`model`	the full name of the noise introduction model used.
`param`	a list of the argument values.
`call`	the function call.

Note

Noise model adapted from the papers in References.

References

M. Koziarski, B. Krawczyk, and M. Wozniak. Radial-based oversampling for noisy imbalanced data classification. Neurocomputing, 343:19–33, 2019. doi:10.1016/j.neucom.2018.04.089.

Examples

# load the dataset
data(iris2D)

# usage of the default method
set.seed(9)
outdef <- sym_sgau_an(x = iris2D[,-ncol(iris2D)], y = iris2D[,ncol(iris2D)], level = 0.1)

# show results
summary(outdef, showid = TRUE)
plot(outdef)

# usage of the method for class formula
set.seed(9)
outfrm <- sym_sgau_an(formula = Species ~ ., data = iris2D, level = 0.1)

# check the match of noisy indices
identical(outdef$idnoise, outfrm$idnoise)

[Package noisemodel version 1.0.2 Index]