calc_nreps {CAISEr}R Documentation

Determine sample sizes for a set of algorithms on a single problem instance


Iteratively calculates the required sample sizes for K algorithms on a given problem instance, so that the standard errors of the estimates of the pairwise differences in performance is controlled at a predefined level.


  dif = "simple",
  comparisons = "all.vs.all",
  method = "param",
  nstart = 20,
  nmax = 1000,
  seed = NULL,
  boot.R = 499,
  ncpus = 1,
  force.balanced = FALSE,
  load.folder = NA,
  save.folder = NA



a list object containing the definitions of the problem instance. See Section Instance for details.


a list object containing the definitions of all algorithms. See Section Algorithms for details.


desired upper limit for the standard error of the estimated difference between pairs of algorithms. See Section ⁠Pairwise Differences⁠ for details.


type of difference to be used. Accepts "perc" (for percent differences) or "simple" (for simple differences)


type of comparisons being performed. Accepts "all.vs.first" (in which cases the first object in algorithms is considered to be the reference algorithm) or "all.vs.all" (if there is no reference and all pairwise comparisons are desired).


method to use for estimating the standard errors. Accepts "param" (for parametric) or "boot" (for bootstrap)


initial number of algorithm runs for each algorithm. See Section ⁠Initial Number of Observations⁠ for details.


maximum total allowed number of runs to execute. Loaded results (see load.folder below) do not count towards this total.


seed for the random number generator


number of bootstrap resamples to use (if method == "boot")


number of cores to use


logical flag to force the use of balanced sampling for the algorithms on each instance


name of folder to load results from. Use either "" or "./" for the current working directory. Accepts relative paths. Use NA for not saving. calc_nreps() will look for a .RDS file with the same name


name of folder to save the results. Use either "" or "./" for the current working directory. Accepts relative paths. Use NA for not saving.


a list object containing the following items:


Parameter instance must be a named list containing all relevant parameters that define the problem instance. This list must contain at least the field instance$FUN, with the name of the function implementing the problem instance, that is, a routine that calculates y = f(x). If the instance requires additional parameters, these must also be provided as named fields.


Object algorithms is a list in which each component is a named list containing all relevant parameters that define an algorithm to be applied for solving the problem instance. In what follows algorithm[[k]] refers to any algorithm specified in the algorithms list.

algorithm[[k]] must contain an algorithm[[k]]$FUN field, which is a character object with the name of the function that calls the algorithm; as well as any other elements/parameters that algorithm[[k]]$FUN requires (e.g., stop criteria, operator names and parameters, etc.).

The function defined by the routine algorithm[[k]]$FUN must have the following structure: supposing that the list in algorithm[[k]] has fields algorithm[[k]]$FUN = "myalgo", algorithm[[k]]$par1 = "a" and algorithm$par2 = 5, then:

         myalgo <- function(par1, par2, instance, ...){
               # do stuff
               # ...

That is, it must be able to run if called as:

         # remove '$FUN' and '$alias' fields from list of arguments
         # and include the problem definition as field 'instance'
         myargs          <- algorithm[names(algorithm) != "FUN"]
         myargs          <- myargs[names(myargs) != "alias"]
         myargs$instance <- instance

         # call function$FUN,
                 args = myargs)

The algorithm$FUN routine must return a list containing (at least) the performance value of the final solution obtained, in a field named value (e.g., result$value) after a given run.

Initial Number of Observations

In the general case the initial number of observations per algorithm (nstart) should be relatively high. For the parametric case we recommend between 10 and 20 if outliers are not expected, or between 30 and 50 if that assumption cannot be made. For the bootstrap approach we recommend using at least 20. However, if some distributional assumptions can be made - particularly low skewness of the population of algorithm results on the test instances), then nstart can in principle be as small as 5 (if the output of the algorithms were known to be normal, it could be 1).

In general, higher sample sizes are the price to pay for abandoning distributional assumptions. Use lower values of nstart with caution.

Pairwise Differences

Parameter dif informs the type of difference in performance to be used for the estimation (\mu_a and \mu_b represent the mean performance of any two algorithms on the test instance, and mu represents the grand mean of all algorithms given in algorithms):


Felipe Campelo (



# Example using dummy algorithms and instances. See ?dummyalgo for details.
# We generate dummy algorithms with true means 15, 10, 30, 15, 20; and true
# standard deviations 2, 4, 6, 8, 10.
algorithms <- mapply(FUN = function(i, m, s){
                          list(FUN   = "dummyalgo",
                               alias = paste0("algo", i),
                       = "rnorm",
                      = list(mean = m, sd = s))},
                     i = c(alg1 = 1, alg2 = 2, alg3 = 3, alg4 = 4, alg5 = 5),
                     m = c(15, 10, 30, 15, 20),
                     s = c(2, 4, 6, 8, 10),
                     SIMPLIFY = FALSE)

# Make a dummy instance with a centered (zero-mean) exponential distribution:
instance = list(FUN = "dummyinstance", distr = "rexp", rate = 5, bias = -1/5)

# Explicitate all other parameters (just this one time:
# most have reasonable default values)
myreps <- calc_nreps(instance   = instance,
                      algorithms = algorithms,
                      se.max     = 0.05,          # desired (max) standard error
                      dif        = "perc",        # type of difference
                      comparisons = "all.vs.all", # differences to consider
                      method     = "param",       # method ("param", "boot")
                      nstart     = 15,            # initial number of samples
                      nmax       = 1000,          # maximum allowed sample size
                      seed       = 1234,          # seed for PRNG
                      boot.R     = 499,           # number of bootstrap resamples (unused)
                      ncpus      = 1,             # number of cores to use
                      force.balanced = FALSE,     # force balanced sampling?
                      load.folder   = NA,         # file to load results from
                      save.folder = NA)         # folder to save results

[Package CAISEr version 1.0.17 Index]