stat_filter {nestedcv} | R Documentation |
Univariate filter for binary classification with mixed predictor datatypes
Description
Univariate statistic filter for dataframes of predictors with mixed numeric and categorical datatypes. Different statistical tests are used depending on the data type of response vector and predictors:
- Binary class response:
bin_stat_filter()
t-test for continuous data, chi-squared test for categorical data
- Multiclass response:
class_stat_filter()
one-way ANOVA for continuous data, chi-squared test for categorical data
- Continuous response:
cor_stat_filter()
correlation (or linear regression) for continuous data and binary data, one-way ANOVA for categorical data
Usage
stat_filter(y, x, ...)
bin_stat_filter(
y,
x,
force_vars = NULL,
nfilter = NULL,
p_cutoff = 0.05,
rsq_cutoff = NULL,
type = c("index", "names", "full", "list"),
...
)
class_stat_filter(
y,
x,
force_vars = NULL,
nfilter = NULL,
p_cutoff = 0.05,
rsq_cutoff = NULL,
type = c("index", "names", "full", "list"),
...
)
cor_stat_filter(
y,
x,
cor_method = c("pearson", "spearman", "lm"),
force_vars = NULL,
nfilter = NULL,
p_cutoff = 0.05,
rsq_cutoff = NULL,
rsq_method = "pearson",
type = c("index", "names", "full", "list"),
...
)
Arguments
y |
Response vector |
x |
Matrix or dataframe of predictors |
... |
optional arguments, e.g. |
force_vars |
Vector of column names within |
nfilter |
Number of predictors to return. If |
p_cutoff |
p value cut-off |
rsq_cutoff |
r^2 cutoff for removing predictors due to collinearity.
Default |
type |
Type of vector returned. Default "index" returns indices, "names" returns predictor names, "full" returns a dataframe of statistics, "list" returns a list of 2 matrices of statistics, one for continuous predictors, one for categorical predictors. |
cor_method |
For |
rsq_method |
character string indicating which correlation coefficient
is to be computed. One of "pearson" (default), "kendall", or "spearman".
See |
Details
stat_filter()
is a wrapper which calls bin_stat_filter()
,
class_stat_filter()
or cor_stat_filter()
depending on whether y
is
binary, multiclass or continuous respectively. Ordered factors are converted
to numeric (integer) levels and analysed as if continuous.
Value
Integer vector of indices of filtered parameters (type = "index") or
character vector of names (type = "names") of filtered parameters in order
of test p-value. If type
is "full"
full output is
returned containing a dataframe of statistical results. If type
is
"list"
the output is returned as a list of 2 matrices containing
statistical results separated by continuous and categorical predictors.
Examples
library(mlbench)
data(BostonHousing2)
dat <- BostonHousing2
y <- dat$cmedv ## continuous outcome
x <- subset(dat, select = -c(cmedv, medv, town))
stat_filter(y, x, type = "full")
stat_filter(y, x, nfilter = 5, type = "names")
stat_filter(y, x)
data(iris)
y <- iris$Species ## 3 class outcome
x <- subset(iris, select = -Species)
stat_filter(y, x, type = "full")