regRND {rgnoisefilt} | R Documentation |
Regressand Noise Detection for Regression
Description
Application of the regRND noise filtering method in a regression dataset.
Usage
## Default S3 method:
regRND(x, y, t = 0.2, nfolds = 5, vote = FALSE, ...)
## S3 method for class 'formula'
regRND(formula, data, ...)
Arguments
x |
a data frame of input attributes. |
y |
a double vector with the output regressand of each sample. |
t |
a double in [0,1] with the threshold used by regression noise filter (default: 0.2). |
nfolds |
an integer with the number of folds in which the dataset is split (default: 10). |
vote |
a logical indicating if the consensus voting ( |
... |
other options to pass to the function. |
formula |
a formula with the output regressand and, at least, one input attribute. |
data |
a data frame in which to interpret the variables in the formula. |
Details
Regressand Noise Detection (RND) is an adaptation of Class Noise Detection and Classification (CNDC) found in the field of classification.
In a first step, CNDC builds an ensemble with SVM, Random Forest, Naive Bayes, k-NN and Neural Network.
Then, a sample is marked as noisy using a voting scheme (indicated by the argument vote
): if equal to TRUE
,
a consensus voting is used (in which a sample is marked as noisy if it is misclassified by all the models); if equal to FALSE
,
a majority voting is used (in which a sample is marked as noisy if it is misclassified by more than a half of the models).
Then, the decision to remove a sample is made by a distance filtering.
The implementation of this noise filter to be used in regression problems follows the proposal of Martín et al. (2021),
which is based on the use of a noise threshold (t
) to determine the similarity between the output variable of the samples.
Value
The result of applying the regression filter is a reduced dataset containing the clean samples (without errors or noise), since it removes noisy samples (those with errors).
This function returns an object of class rfdata
, which contains information related to the noise filtering process in the form of a list with the following elements:
xclean |
a data frame with the input attributes of clean samples (without errors). |
yclean |
a double vector with the output regressand of clean samples (without errors). |
numclean |
an integer with the amount of clean samples. |
idclean |
an integer vector with the indices of clean samples. |
xnoise |
a data frame with the input attributes of noisy samples (with errors). |
ynoise |
a double vector with the output regressand of noisy samples (with errors). |
numnoise |
an integer with the amount of noisy samples. |
idnoise |
an integer vector with the indices of noisy samples. |
filter |
the full name of the noise filter used. |
param |
a list of the argument values. |
call |
the function call. |
Note that objects of the class rfdata
support print.rfdata, summary.rfdata and plot.rfdata methods.
References
Z. Nematzadeh, R. Ibrahim and A. Selamat, Improving class noise detection and classification performance: A new two-filter CNDC model, Applied Soft Computer, 94:106428, 2020. doi:10.1016/j.asoc.2020.106428.
J. Martín, J. A. Sáez and E. Corchado, On the regressand noise problem: Model robustness and synergy with regression-adapted noise filters. IEEE Access, 9:145800-145816, 2021. doi:10.1109/ACCESS.2021.3123151.
See Also
regENN
, regAENN
, regGE
, print.rfdata
, summary.rfdata
Examples
# load the dataset
data(rock)
# usage of the default method
set.seed(9)
out.def <- regRND(x = rock[,-ncol(rock)], y = rock[,ncol(rock)])
# show results
summary(out.def, showid = TRUE)
# usage of the method for class formula
set.seed(9)
out.frm <- regRND(formula = perm ~ ., data = rock[,])
# check the match of noisy indices
all(out.def$idnoise == out.frm$idnoise)