R: Discretization using the Chi2 algorithm

chi2 {discretization}

R Documentation

Discretization using the Chi2 algorithm

Description

This function performs Chi2 discretization algorithm. Chi2 algorithm automatically determines a proper Chi-sqaure(\chi^2) threshold that keeps the fidelity of the original numeric dataset.

Usage

chi2(data, alp = 0.5, del = 0.05)

Arguments

`data`	the dataset to be discretize
`alp`	significance level; `\alpha`
`del`	`Inconsistency(data)< \delta`, (Liu and Setiono(1995))

Details

The Chi2 algorithm is based on the \chi^2 statistic, and consists of two phases. In the first phase, it begins with a high significance level(sigLevel), for all numeric attributes for discretization. Each attribute is sorted according to its values. Then the following is performed: phase 1. calculate the \chi^2 value for every pair of adjacent intervals (at the beginning, each pattern is put into its own interval that contains only one value of an attribute); pahse 2. merge the pair of adjacent intervals with the lowest \chi^2 value. Merging continues until all pairs of intervals have \chi^2 values exceeding the parameter determined by sigLevel. The above process is repeated with a decreased sigLevel until an inconsistency rate(\delta), incon(), is exceeded in the discretized data(Liu and Setiono (1995)).

Value

`cutp`	list of cut-points for each variable
`Disc.data`	discretized data matrix

Author(s)

HyunJi Kim polaris7867@gmail.com

References

Liu, H. and Setiono, R. (1995). Chi2: Feature selection and discretization of numeric attributes, Tools with Artificial Intelligence, 388–391.

Liu, H. and Setiono, R. (1997). Feature selection and discretization, IEEE transactions on knowledge and data engineering, Vol.9, no.4, 642–645.

Examples

data(iris)
#---cut-points
chi2(iris,0.5,0.05)$cutp

#--discretized dataset using Chi2 algorithm
chi2(iris,0.5,0.05)$Disc.data

[Package discretization version 1.0-1.1 Index]