discENN {rgnoisefilt} | R Documentation |
Edited Nearest Neighbors for Regression by Discretization
Description
Application of the discENN noise filtering method in a regression dataset.
Usage
## Default S3 method:
discENN(x, y, k = 5, ...)
## S3 method for class 'formula'
discENN(formula, data, ...)
Arguments
x |
a data frame of input attributes. |
y |
a double vector with the output regressand of each sample. |
k |
an integer with the number of nearest neighbors to be used (default: 5). |
... |
other options to pass to the function. |
formula |
a formula with the output regressand and, at least, one input attribute. |
data |
a data frame in which to interpret the variables in the formula. |
Details
discENN
discretizes the numerical output variable to make it compatible with Edited Nearest Neighbors (ENN), typically used in classification tasks.
ENN removes a sample if its class label is different from that of the majority of its nearest neighbors (k
).
Value
The result of applying the regression filter is a reduced dataset containing the clean samples (without errors or noise), since it removes noisy samples (those with errors).
This function returns an object of class rfdata
, which contains information related to the noise filtering process in the form of a list with the following elements:
xclean |
a data frame with the input attributes of clean samples (without errors). |
yclean |
a double vector with the output regressand of clean samples (without errors). |
numclean |
an integer with the amount of clean samples. |
idclean |
an integer vector with the indices of clean samples. |
xnoise |
a data frame with the input attributes of noisy samples (with errors). |
ynoise |
a double vector with the output regressand of noisy samples (with errors). |
numnoise |
an integer with the amount of noisy samples. |
idnoise |
an integer vector with the indices of noisy samples. |
filter |
the full name of the noise filter used. |
param |
a list of the argument values. |
call |
the function call. |
Note that objects of the class rfdata
support print.rfdata, summary.rfdata and plot.rfdata methods.
References
L. Devroye, L. Gyorfi and G. Lugosi, Condensed and edited nearest neighbor rules. In: A Probabilistic Theory of Pattern Recognition, 31:303-313, 1996. doi:10.1007/978-1-4612-0711-5_19.
A. Arnaiz-González, J. Díez-Pastor, J. Rodríguez, C. García-Osorio, Instance selection for regression by discretization. Expert Systems with Applications, 54:340-350, 2016. doi:10.1016/j.eswa.2015.12.046.
See Also
discCNN
, discTL
, discNCL
, print.rfdata
, summary.rfdata
Examples
# load the dataset
data(rock)
# usage of the default method
set.seed(9)
out.def <- discENN(x = rock[,-ncol(rock)], y = rock[,ncol(rock)])
# show results
summary(out.def, showid = TRUE)
# usage of the method for class formula
set.seed(9)
out.frm <- discENN(formula = perm ~ ., data = rock)
# check the match of noisy indices
all(out.def$idnoise == out.frm$idnoise)