R: calculateCrossValidation

calculateCrossValidation {rSRD}

R Documentation

calculateCrossValidation

Description

R interface to test whether the rankings induced by the columns come from the same distribution. If the number of folds and the test method are not specified, the default is the 8-fold Wilcoxon test combined with cross-validation. If the number of rows is less than 8, leave-one-out cross-validation is applied. Columns are ordered based on the SRD values of the different folds, then each consecutive column-pairs are tested. Test statistics for Alpaydin test follows F distribution with df1=2k, df2=k degrees of freedom. Dietterich test statistics follow t-distribution with k degrees of freedom (two-tailed). Wilcoxon test statistics is calculated as the absolute value of the difference of the sum of the positive ranks (W+) and sum of the negative ranks (W-). The distribution for this test statistics can be derived from the Wilcoxon signed rank distribution. For more information about the cross-validation process see Sziklai, Baranyi and Héberger (2021).

Usage

calculateCrossValidation(
  data_matrix,
  method = "Wilcoxon",
  number_of_folds = 8,
  precision = 5,
  output_to_file = TRUE
)

Arguments

`data_matrix`	A DataFrame.
`method`	A string specifying the method. The methods "Wilcoxon", "Alpaydin" and "Dietterich" are available.
`number_of_folds`	The number of folds used in the cross validation. Ranges between 5 to 10.
`precision`	The precision used for the the ranking matrix transformation.
`output_to_file`	Boolean flag to enable file output.

Value

A List containing

a new column order sorted by the median of the SRD values computed on the different folds
a vector of test statistics corresponding to each consecutive column pairs
a vector indicating the test statistics' statistical significance
the SRD values of different folds and
additional data needed for the plotCrossValidation function.

Author(s)

Balázs R. Sziklai sziklai.balazs@krtk.hu, Linus Olsson linusmeol@gmail.com, Jochen Staudacher jochen.staudacher@hs-kempten.de

References

Sziklai, Balázs R., Máté Baranyi, and Károly Héberger (2021). "Testing Cross-Validation Variants in Ranking Environments", arXiv preprint arXiv:2105.11939 (2021).

Examples

df <- data.frame(
Sol_1=c(7, 6, 5, 4, 3, 2, 1),
Sol_2=c(1, 2, 3, 4, 5, 7, 6),
Sol_3=c(1, 2, 3, 4, 7, 5, 6),
Ref=c(1, 2, 3, 4, 5, 6, 7))

calculateCrossValidation(df, output_to_file = FALSE)

[Package rSRD version 0.1.7 Index]