chi2 {discretization} | R Documentation |
Discretization using the Chi2 algorithm
Description
This function performs Chi2 discretization algorithm. Chi2 algorithm automatically determines a proper Chi-sqaure() threshold that keeps the fidelity of the original numeric dataset.
Usage
chi2(data, alp = 0.5, del = 0.05)
Arguments
data |
the dataset to be discretize |
alp |
significance level; |
del |
|
Details
The Chi2 algorithm is based on the statistic, and consists of two phases.
In the first phase, it begins with a high significance level(sigLevel), for all numeric attributes for discretization. Each attribute is sorted according to its values. Then the following is performed:
phase 1. calculate the
value for every pair of adjacent intervals (at the beginning, each pattern is put into its own interval that contains only one value of an attribute);
pahse 2. merge the pair of adjacent intervals with the lowest
value. Merging continues until all pairs of intervals have
values exceeding the parameter determined by sigLevel. The above process is repeated with a decreased sigLevel until an inconsistency rate(
),
incon()
, is exceeded in the discretized data(Liu and Setiono (1995)).
Value
cutp |
list of cut-points for each variable |
Disc.data |
discretized data matrix |
Author(s)
HyunJi Kim polaris7867@gmail.com
References
Liu, H. and Setiono, R. (1995). Chi2: Feature selection and discretization of numeric attributes, Tools with Artificial Intelligence, 388–391.
Liu, H. and Setiono, R. (1997). Feature selection and discretization, IEEE transactions on knowledge and data engineering, Vol.9, no.4, 642–645.
See Also
Examples
data(iris)
#---cut-points
chi2(iris,0.5,0.05)$cutp
#--discretized dataset using Chi2 algorithm
chi2(iris,0.5,0.05)$Disc.data