ttest_filter {nestedcv} | R Documentation |
Univariate filters
Description
A selection of simple univariate filters using t-test, Wilcoxon test, one-way
ANOVA or correlation (Pearson or Spearman) for ranking variables. These
filters are designed for speed. ttest_filter
uses the Rfast
package,
wilcoxon_filter
(Mann-Whitney) test uses
matrixTests::row_wilcoxon_twosample, anova_filter
uses
matrixTests::col_oneway_welch (Welch's F-test) from the matrixTests
package. Can be applied to all or a subset of predictors. For mixed datasets
(combined continuous & categorical) see stat_filter()
Usage
ttest_filter(
y,
x,
force_vars = NULL,
nfilter = NULL,
p_cutoff = 0.05,
rsq_cutoff = NULL,
type = c("index", "names", "full"),
keep_factors = TRUE,
...
)
anova_filter(
y,
x,
force_vars = NULL,
nfilter = NULL,
p_cutoff = 0.05,
rsq_cutoff = NULL,
type = c("index", "names", "full"),
keep_factors = TRUE,
...
)
wilcoxon_filter(
y,
x,
force_vars = NULL,
nfilter = NULL,
p_cutoff = 0.05,
rsq_cutoff = NULL,
type = c("index", "names", "full"),
exact = FALSE,
keep_factors = TRUE,
...
)
correl_filter(
y,
x,
method = "pearson",
force_vars = NULL,
nfilter = NULL,
p_cutoff = 0.05,
rsq_cutoff = NULL,
type = c("index", "names", "full"),
keep_factors = TRUE,
...
)
Arguments
y |
Response vector |
x |
Matrix or dataframe of predictors |
force_vars |
Vector of column names within |
nfilter |
Number of predictors to return. If |
p_cutoff |
p value cut-off |
rsq_cutoff |
r^2 cutoff for removing predictors due to collinearity.
Default |
type |
Type of vector returned. Default "index" returns indices, "names" returns predictor names, "full" returns a matrix of p values. |
keep_factors |
Logical affecting factors with 3 or more levels.
Dataframes are coerced to a matrix using data.matrix. Binary
factors are converted to numeric values 0/1 and analysed as such. If
|
... |
optional arguments, including |
exact |
Logical whether exact or approximate p-value is calculated.
Default is |
method |
Type of correlation, either "pearson" or "spearman". |
Value
Integer vector of indices of filtered parameters (type = "index") or
character vector of names (type = "names") of filtered parameters in order
of t-test p-value. If type
is "full"
full output from
Rfast::ttests is returned.
See Also
Examples
## sigmoid function
sigmoid <- function(x) {1 / (1 + exp(-x))}
## load iris dataset and simulate a binary outcome
data(iris)
dt <- iris[, 1:4]
colnames(dt) <- c("marker1", "marker2", "marker3", "marker4")
dt <- as.data.frame(apply(dt, 2, scale))
y2 <- sigmoid(0.5 * dt$marker1 + 2 * dt$marker2) > runif(nrow(dt))
y2 <- factor(y2, labels = c("C1", "C2"))
ttest_filter(y2, dt) # returns index of filtered predictors
ttest_filter(y2, dt, type = "name") # shows names of predictors
ttest_filter(y2, dt, type = "full") # full results table
data(iris)
dt <- iris[, 1:4]
y3 <- iris[, 5]
anova_filter(y3, dt) # returns index of filtered predictors
anova_filter(y3, dt, type = "full") # shows names of predictors
anova_filter(y3, dt, type = "name") # full results table