mlr_filters_kruskal_test {mlr3filters}R Documentation

Kruskal-Wallis Test Filter

Description

Kruskal-Wallis rank sum test filter calling stats::kruskal.test().

The filter value is -log10(p) where p is the p-value. This transformation is necessary to ensure numerical stability for very small p-values.

Super class

mlr3filters::Filter -> FilterKruskalTest

Methods

Public methods

Inherited methods

Method new()

Create a FilterKruskalTest object.

Usage
FilterKruskalTest$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterKruskalTest$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Note

This filter, in its default settings, can handle missing values in the features. However, the resulting filter scores may be misleading or at least difficult to compare if some features have a large proportion of missing values.

If a feature has not at least one non-missing observation per label, the resulting score will be NA. Missing scores appear in a random, non-deterministic order at the end of the vector of scores.

References

For a benchmark of filter methods:

Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

task = mlr3::tsk("iris")
filter = flt("kruskal_test")
filter$calculate(task)
as.data.table(filter)

# transform to p-value
10^(-filter$scores)

if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("spam")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("kruskal_test"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.rpart"))

  graph$train(task)
}

[Package mlr3filters version 0.8.0 Index]