ForestDisc {ForestDisc} | R Documentation |
Multivariate discretization for supervised learning using Random Forest and moment matching optimization
Description
ForestDisc is a supervised, multivariate and non-parametric discretization algorithm based on tree ensembles learning and moment matching optimization. This version of the algorithm relies on random forest algorithm to learn a large set of split points that conserves the relationship between attributes and the target class, and on moment matching optimization to transform this set into a reduced number of cut points matching as well as possible statistical properties of the initial set of split points. For each attribute to be discretized, the set S of its related split points extracted through random forest is mapped to a reduced set C of cut points of size k.
Usage
ForestDisc(data,id_target,ntree=50,max_splits=10,opt_meth="NelderMead")
Arguments
data |
Data frame to be discretized. |
id_target |
Column id of the target class. |
ntree |
Number of trees to grow using random forest algorithm in order to learn split points. The default value is 50. |
max_splits |
Maximum number of cut points to be used for discretizing continuous attributes in the data. Possible values for 'max_splits' range between 2 and 10. Default value = 10. |
opt_meth |
The non-linear optimization algorithm to use in order to get the optimal set of cut points matching as well as possible the set of split points. The possible values are DIviding RECTangles algorithm "directL", NelderMead Simplex method "NelderMead", Sequential Least-Squares Quadratic Programming "SLSQP". (more details about these non-linear optimization algorithms can be found in the documentation of the "NLopt" library). The default value used is "NelderMead". |
Value
List with components:
Data_disc |
Discretized data. |
cont_variables |
Continuous attributes column ids. |
Listcutp |
List of cut points used to discretize continuous attributes. |
cut_points |
Data frame summarizing the best solution returned. |
opt_results |
Data frame summarizing all the solutions returned for different realizations. Each realization is determined by a size of the set of cut points, ranging between 2 and 'max_splits'. |
Author(s)
Haddouchi Maïssae
Examples
data(iris)
Mydata=iris
id_target=5
set.seed(1234)
Mydata_Disc=ForestDisc(Mydata,id_target)