| regIPF {rgnoisefilt} | R Documentation | 
Iterative Partitioning Filter for Regression
Description
Application of the regIPF noise filtering method in a regression dataset.
Usage
## Default S3 method:
regIPF(x, y, t = 0.4, nfolds = 10, vote = FALSE, p = 0.01, s = 3, i = 0.5, ...)
## S3 method for class 'formula'
regIPF(formula, data, ...)
Arguments
| x | a data frame of input attributes. | 
| y | a double vector with the output regressand of each sample. | 
| t | a double in [0,1] with the threshold used by regression noise filter (default: 0.2). | 
| nfolds | number of folds in which the dataset is split (default: 10). | 
| vote | a logical indicating if the consensus voting ( | 
| p | a double in [0,1] with the minimum proportion of original samples that must be labeled as noisy (default: 0.4). | 
| s | an integer with the number of iterations without improvement for the stopping criterion (default: 3). | 
| i | a double in [0,1] with the proportion of good samples which must be retained per iteration (default: 0.5). | 
| ... | other options to pass to the function. | 
| formula | a formula with the output regressand and, at least, one input attribute. | 
| data | a data frame in which to interpret the variables in the formula. | 
Details
In classification, Iterative Partitioning Filter (IPF) builds a classifier with C4.5 on each fold (nfolds) to evaluate the whole dataset.
The noisy samples are removed depending on the chosen voting scheme (indicated by the argument vote): if equal to TRUE,
a consensus voting is used (in which a sample is removed if it is misclassified by all the models); if equal to FALSE,
a majority voting is used (in which a sample is removed if it is misclassified by more than a half of the models).
In addition, IPF integrates an iterative process that stops depending on the arguments p, s and i.
The implementation of this noise filter to be used in regression problems follows the proposal of Martín et al. (2021),
which is based on the use of a noise threshold (t) to determine the similarity between the output variable of the samples.
Value
The result of applying the regression filter is a reduced dataset containing the clean samples (without errors or noise), since it removes noisy samples (those with errors).
This function returns an object of class rfdata, which contains information related to the noise filtering process in the form of a list with the following elements:
| xclean | a data frame with the input attributes of clean samples (without errors). | 
| yclean | a double vector with the output regressand of clean samples (without errors). | 
| numclean | an integer with the amount of clean samples. | 
| idclean | an integer vector with the indices of clean samples. | 
| xnoise | a data frame with the input attributes of noisy samples (with errors). | 
| ynoise | a double vector with the output regressand of noisy samples (with errors). | 
| numnoise | an integer with the amount of noisy samples. | 
| idnoise | an integer vector with the indices of noisy samples. | 
| filter | the full name of the noise filter used. | 
| param | a list of the argument values. | 
| call | the function call. | 
Note that objects of the class rfdata support print.rfdata, summary.rfdata and plot.rfdata methods.
References
T. M. Khoshgoftaar and P. Rebours, Improving software quality prediction by noise filtering techniques, Journal of Computer Science and Technology, 22:387-396, 2007. doi:10.1007/s11390-007-9054-2
J. Martín, J. A. Sáez and E. Corchado, On the regressand noise problem: Model robustness and synergy with regression-adapted noise filters. IEEE Access, 9:145800-145816, 2021. doi:10.1109/ACCESS.2021.3123151.
See Also
regIRF, regCVCF, regFMF, print.rfdata, summary.rfdata
Examples
# load the dataset
data(rock)
# usage of the default method
set.seed(9)
out.def <- regIPF(x = rock[,-ncol(rock)], y = rock[,ncol(rock)])
# show results
summary(out.def, showid = TRUE)
# usage of the method for class formula
set.seed(9)
out.frm <- regIPF(formula = perm ~ ., data = rock)
# check the match of noisy indices
all(out.def$idnoise == out.frm$idnoise)