JMI3 {praznik} | R Documentation |
Third-order joint mutual information filter
Description
The method starts with two features: X_1
of a maximal mutual information with the decision Y
, and X_2
of a maximal value of I(X_1,X_2;Y)
, as would be selected second by a regular JMI
.
Then, it greedily adds feature X
with a maximal value of the following criterion:
J(X)=\frac{1}{2}\sum_{(U,W)\in S^2; U\neq W} I(X,U,W;Y),
where S
is the set of already selected features.
Usage
JMI3(X, Y, k = 3, threads = 0)
Arguments
X |
Attribute table, given as a data frame with either factors (preferred), booleans, integers (treated as categorical) or reals (which undergo automatic categorisation; see below for details).
Single vector will be interpreted as a data.frame with one column.
|
Y |
Decision attribute; should be given as a factor, but other options are accepted, exactly like for attributes.
|
k |
Number of attributes to select.
Must not exceed |
threads |
Number of threads to use; default value, 0, means all available to OpenMP. |
Value
A list with two elements: selection
, a vector of indices of the selected features in the selection order, and score
, a vector of corresponding feature scores.
Names of both vectors will correspond to the names of features in X
.
Both vectors will be at most of a length k
, as the selection may stop sooner, even during initial selection, in which case both vectors will be empty.
Note
This method has a complexity of O(k^2\cdot m \cdot n)
, while other filters have O(k\cdot m \cdot n)
— for larger k
, it will be substantially slower.
In the original paper, special shrinkage estimator of MI is used; in praznik, all algorithms use ML estimators, so is JMI3
.
The method requires input to be discrete to use empirical estimators of distribution, and, consequently, information gain or entropy.
To allow smoother user experience, praznik automatically coerces non-factor vectors in inputs, which requires additional time, memory and may yield confusing results – the best practice is to convert data to factors prior to feeding them in this function.
Real attributes are cut into about 10 equally-spaced bins, following the heuristic often used in literature.
Precise number of cuts depends on the number of objects; namely, it is n/3
, but never less than 2 and never more than 10.
Integers (which technically are also numeric) are treated as categorical variables (for compatibility with similar software), so in a very different way – one should be aware that an actually numeric attribute which happens to be an integer could be coerced into a n
-level categorical, which would have a perfect mutual information score and would likely become a very disruptive false positive.
References
"Efficient feature selection using shrinkage estimators" K. Sechidis, L. Azzimonti, A. Pocock, G. Corani, J. Weatherall and G. Brown. Machine Learning, 108 (8-9), pp. 1261-1286 (2019)
Examples
## Not run: data(MadelonD)
JMI3(MadelonD$X,MadelonD$Y,20)
## End(Not run)