calc_instances {CAISEr} | R Documentation |
Calculates either the number of instances, or the power(s) of the comparisons of multiple algorithms.
calc_instances(
ncomparisons,
d,
ninstances = NULL,
power = NULL,
sig.level = 0.05,
alternative.side = "two.sided",
test = "t.test",
power.target = "mean"
)
ncomparisons |
number of comparisons planned |
d |
minimally relevant effect size (MRES, expressed as a standardized effect size, i.e., "deviation from H0" / "standard deviation") |
ninstances |
the number of instances to be used in the experiment. |
power |
target power for the comparisons (see |
sig.level |
desired family-wise significance level (alpha) for the experiment |
alternative.side |
type of alternative hypothesis to be performed ("two.sided" or "one.sided") |
test |
type of test to be used ("t.test", "wilcoxon" or "binomial") |
power.target |
which comparison should have the desired |
The main use of this routine uses the closed formula of the t-test to
calculate the number of instances required for the comparison of pairs of
algorithms, given a desired power and standardized effect size of
interest. Significance levels of each comparison are adjusted using
Holm's step-down correction (the default). The routine also takes into
account whether the desired statistical power refers to the mean power
(the default), median, or worst-case (which is equivalent to
designing the experiment for the more widely-known Bonferroni correction).
See the reference by Campelo and Wanner
for details.
a list object containing the following items:
ninstances
- number of instances
power
- the power of the comparison
d
- the effect size
sig.level
- significance level
alternative.side
- type of alternative hypothesis
test
- type of test
If the parameter test
is set to either Wilcoxon
or Binomial
, this
routine approximates the number of instances using the ARE of these tests
in relation to the paired t.test, using the formulas (see reference by
Campelo and Takahashi
for details):
n.wilcox = n.ttest / 0.86 = 1.163 * n.ttest
n.binom = n.ttest / 0.637 = 1.570 * n.ttest
Felipe Campelo (fcampelo@ufmg.br, f.campelo@aston.ac.uk)
P. Mathews. Sample size calculations: Practical methods for engineers and scientists. Mathews Malnar and Bailey, 2010.
F. Campelo, F. Takahashi: Sample size estimation for power and accuracy in the experimental comparison of algorithms. Journal of Heuristics 25(2):305-338, 2019.
F. Campelo, E. Wanner: Sample size calculations for the experimental comparison of multiple algorithms on multiple problem instances. Submitted, Journal of Heuristics, 2019.
# Calculate sample size for mean-case power
K <- 10 # number of comparisons
alpha <- 0.05 # significance level
power <- 0.9 # desired power
d <- 0.5 # MRES
out <- calc_instances(K, d,
power = power,
sig.level = alpha)
# Plot power of each comparison to detect differences of magnitude d
plot(1:K, out$power,
type = "b", pch = 20, las = 1, ylim = c(0, 1), xlab = "comparison",
ylab = "power", xaxs = "i", xlim = c(0, 11))
grid(11, NA)
points(c(0, K+1), c(power, power), type = "l", col = 2, lty = 2, lwd = .5)
text(1, 0.93, sprintf("Mean power = %2.2f for N = %d",
out$mean.power, out$ninstances), adj = 0)
# Check sample size if planning for Wilcoxon tests:
calc_instances(K, d,
power = power,
sig.level = alpha,
test = "wilcoxon")$ninstances
# Calculate power profile for predefined sample size
N <- 45
out2 <- calc_instances(K, d, ninstances = N, sig.level = alpha)
points(1:K, out2$power, type = "b", pch = 19, col = 3)
text(6, .7, sprintf("Mean power = %2.2f for N = %d",
out2$mean.power, out2$ninstances), adj = 0)
# Sample size for worst-case (Bonferroni) power of 0.8, using Wilcoxon
out3 <- calc_instances(K, d, power = 0.9, sig.level = alpha,
test = "wilcoxon", power.target = "worst.case")
out3$ninstances
# For median power:
out4 <- calc_instances(K, d, power = 0.9, sig.level = alpha,
test = "wilcoxon", power.target = "median")
out4$ninstances
out4$power