sym_opt_ln {noisemodel} | R Documentation |
Symmetric optimistic label noise
Description
Introduction of Symmetric optimistic label noise into a classification dataset.
Usage
## Default S3 method:
sym_opt_ln(x, y, level, levelH = 0.9, order = levels(y), sortid = TRUE, ...)
## S3 method for class 'formula'
sym_opt_ln(formula, data, ...)
Arguments
x |
a data frame of input attributes. |
y |
a factor vector with the output class of each sample. |
level |
a double in [0,1] with the noise level to be introduced. |
levelH |
a double in (0.5, 1] with the noise level for higher classes (default: 0.9). |
order |
a character vector indicating the order of the classes (default: |
sortid |
a logical indicating if the indices must be sorted at the output (default: |
... |
other options to pass to the function. |
formula |
a formula with the output class and, at least, one input attribute. |
data |
a data frame in which to interpret the variables in the formula. |
Details
Symmetric optimistic label noise randomly selects (level
·100)% of the samples
in the dataset with independence of their class.
In the optimistic case, the probability of a class i of being mislabeled as class j is
higher for j > i in comparison to j < i.
Thus, when noise for a certain class occurs, it is assigned to a random higher class with probability levelH
and to a random lower class with probability 1-levelH
. The order of the classes is determined by
order
.
Value
An object of class ndmodel
with elements:
xnoise |
a data frame with the noisy input attributes. |
ynoise |
a factor vector with the noisy output class. |
numnoise |
an integer vector with the amount of noisy samples per class. |
idnoise |
an integer vector list with the indices of noisy samples. |
numclean |
an integer vector with the amount of clean samples per class. |
idclean |
an integer vector list with the indices of clean samples. |
distr |
an integer vector with the samples per class in the original data. |
model |
the full name of the noise introduction model used. |
param |
a list of the argument values. |
call |
the function call. |
Note
Noise model adapted from the papers in References.
References
R. C. Prati, J. Luengo, and F. Herrera. Emerging topics and challenges of learning from noisy data in nonstandard classification: a survey beyond binary class noise. Knowledge and Information Systems, 60(1):63–97, 2019. doi:10.1007/s10115-018-1244-4.
See Also
sym_usim_ln
, sym_natd_ln
, print.ndmodel
, summary.ndmodel
, plot.ndmodel
Examples
# load the dataset
data(iris2D)
# usage of the default method
set.seed(9)
outdef <- sym_opt_ln(x = iris2D[,-ncol(iris2D)], y = iris2D[,ncol(iris2D)],
level = 0.1, order = c("virginica", "setosa", "versicolor"))
# show results
summary(outdef, showid = TRUE)
plot(outdef)
# usage of the method for class formula
set.seed(9)
outfrm <- sym_opt_ln(formula = Species ~ ., data = iris2D,
level = 0.1, order = c("virginica", "setosa", "versicolor"))
# check the match of noisy indices
identical(outdef$idnoise, outfrm$idnoise)