MBFR {IDmining} | R Documentation |
Morisita-Based Filter for Regression Problems
Description
Executes the MBFR algorithm for supervised feature selection.
Usage
MBFR(XY, scaleQ, m=2, C=NULL)
Arguments
XY |
A |
scaleQ |
A vector containing the values of |
m |
The value of the parameter m (by default: |
C |
The number of steps of the SFS procedure (by default: |
Details
-
\ell
is the edge length of the grid cells (or quadrats). Since the data (and consenquently the grid) are rescaled to the[0,1]
interval,\ell
is equal to1
for a grid consisting of only one cell. -
\ell^{-1}
is the number of grid cells (or quadrats) along each axis of the Euclidean space in which the data points are embedded. -
\ell^{-1}
is equal toQ^{(1/E)}
whereQ
is the number of grid cells andE
is the number of variables (or features). -
\ell^{-1}
is directly related to\delta
(see References). -
\delta
is the diagonal length of the grid cells. The values of
\ell^{-1}
inscaleQ
must be chosen according to the linear part of the\log
-\log
plot relating the\log
values of the multipoint Morisita index to the\log
values of\delta
(or, equivalently, to the\log
values of\ell^{-1}
) (seelogMINDEX
).
Value
A list of five elements:
a vector containing the identifier numbers of the original features in the order they are selected through the Sequential Forward Selection (SFS) search procedure.
the names of the corresponding features.
the corresponding values of
Diss
.the ID estimate of the output variable.
a
C \times 3
matrix containing: (column 1) the ID estimates of the subsets retained by the SFS procedure with the target variable; (column 2) the ID estimates of the subsets retained by the SFS procedure without the output variable; (column 3) the values ofDiss
of the subsets retained by the SFS procedure.
Author(s)
Jean Golay jeangolay@gmail.com
References
J. Golay, M. Leuenberger and M. Kanevski (2017). Feature selection for regression problems based on the Morisita estimator of intrinsic dimension, Pattern Recognition 70:126–138.
J. Golay, M. Leuenberger and M. Kanevski (2015). Morisita-based feature selection for regression problems.Proceedings of the 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Bruges (Belgium).
Examples
## Not run:
bf <- Butterfly(10000)
fly_select <- MBFR(bf, 5:25)
var_order <- fly_select[[2]]
var_perf <- fly_select[[3]]
dev.new(width=5, height=4)
plot(var_perf,type="b",pch=16,lwd=2,xaxt="n",xlab="",ylab="",
ylim=c(0,1),col="red",panel.first={grid(lwd=1.5)})
axis(1,1:length(var_order),labels=var_order)
mtext(1,text = "Added Features (from left to right)",line = 2.5,cex=1)
mtext(2,text = "Estimated Dissimilarity",line = 2.5,cex=1)
## End(Not run)