repkfold_ttest {correctR}R Documentation

Compute correlated t-statistic and p-value for repeated k-fold cross-validated results

Description

Compute correlated t-statistic and p-value for repeated k-fold cross-validated results

Usage

repkfold_ttest(data, n1, n2, k, r, tailed = c("two", "one"), greater = NULL)

Arguments

data

data.frame of values for model A and model B over repeated k-fold cross-validation. Four named columns are expected: "model", "values", "k", and "k"

n1

integer denoting train set size

n2

integer denoting test set size

k

integer denoting number of folds used in k-fold

r

integer denoting number of repeats per fold

tailed

character denoting whether to perform a two-tailed or one-tailed test. Can be one of "two" or "one". Defaults to "two"

greater

value specifying which value in the "model" column is greater for the one-tailed test if tailed = "one". Defaults to NULL

Value

data.frame containing the test statistic and p-value

Author(s)

Trent Henderson

References

Nadeau, C., and Bengio, Y. Inference for the Generalization Error. Machine Learning 52, (2003).

Bouckaert, R. R., and Frank, E. Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms. Advances in Knowledge Discovery and Data Mining. PAKDD 2004. Lecture Notes in Computer Science, 3056, (2004).

Examples

tmp <- data.frame(model = rep(c(1, 2), each = 60),
  values = c(stats::rnorm(60, mean = 0.6, sd = 0.1),
  stats::rnorm(60, mean = 0.4, sd = 0.1)),
  k = rep(c(1, 1, 2, 2), times = 15),
  r = rep(c(1, 2), times = 30))

repkfold_ttest(data = tmp, n1 = 80, n2 = 20, k = 2, r = 2, tailed = "two")


[Package correctR version 0.2.1 Index]