R: Simulation procedure

sim {POSSA}

R Documentation

Simulation procedure

Description

This function performs the simulation procedure in order to get the p values that will eventually serve for power calculations (via pow). The observation values ("sample") to be tested are simulated via the given fun_obs function, and the significance testing is performed via the given fun_test function. The numbers of observations per look (for a sequential design) are specified in n_obs.

Usage

sim(
  fun_obs,
  n_obs,
  fun_test,
  n_iter = 45000,
  adjust_n = 1,
  seed = 8,
  pair = NULL,
  ignore_suffix = FALSE,
  prog_bar = FALSE,
  hush = FALSE
)

Arguments

`fun_obs`	A `function` that creates the observations (i.e., the "sample"; all values for the dependent variable(s)). The respective maximum observation number(s), given in `n_obs`, will be passed to the `fun_obs`. For this, the returned value must be a named list, where the names correspond exactly to the arguments in `fun_test`. In case of sequential testing, the observations returned by `fun_obs` will be reduced to the specified (smaller) number(s) of observations for each given interim "look" (as a simulation for what would happen if collection was stopped at that given look), to be used in `fun_test`. Optionally, the `fun_obs` can be passed additional arguments (via a `list`); see Details.
`n_obs`	A numeric vector or a named list of numeric vectors. Specifies the numbers of observations (i.e., samples sizes) that are to be generated by `fun_obs` and then tested in `fun_test`. If a single vector is given, this will be used for all observation number arguments in the `fun_obs` and for the sample size adjustments for the arguments in the `fun_test` functions. Otherwise, if a named list of numeric vectors is given, the names must correspond exactly to the argument names in `fun_obs` and `fun_test`, so that the respective numeric vectors are used for each given sample variable. For convenience, in case of a "`_h`" suffix, the variable will be divided into names with "`_h0`" and "`_h1`" suffixes for `fun_test` (but not for `fun_obs`); see Details.
`fun_test`	The function for significance testing. The list of samples returned by `fun_obs` (with observation numbers specified in `n_obs`) will be passed into this `fun_test` function as arguments, to be used in the given statistical significance tests in this function. To correctly calculate the sample sizes in `POSSA::pow`, the argument names for the sample that varies depending on whether the null (H0) and alternative (H1) hypothesis is true should be indicated with "`_h0`" and "`_h1`" suffixes, respectively, with a common root (so, e.g., "`var_x_h0`" and "`var_x_h1`"). Then, in the resulting `data.frame`, their sample size (which must always be identical) will be automatically merged into a single column with a trimmed "`_h`" suffix (e.g., "`var_x_h`"). (Otherwise, the sample sizes of both H0 and H1 would be calculated toward the total expected sample in either case, which is of course incorrect. There are internal checks to prevent this, but the intended total sample size can also be double-checked in the returned `data.frame`'s `.n_total` column.) Within-subject observations, i.e., multiple observations per group, should be specified with "`GRP`" prefix for a single group (e.g., simply "`GRP`", or "`GRP_mytest`") and, for multiple groups, "`grp_`" prefix with a following group name (e.g., "`grp_1`" or "`grp_alpha`"); the numbers of multiple observations in each group can then be specified in `fun_obs` via their group name (since the respective numbers of observations should always be the same anyway); see Examples. To be recognized by the `POSSA::pow` function, the `fun_test` must return a named vector including a pair (or pairs) of p values for H0 and H1 outcomes, where each p value's name must be specified with a "`p_`" prefix and a "`_h0`" suffix for H0 outcome or a "`_h1`" suffix for H1 outcome (e.g., `p_h0`, `p_h1`; `p_ttest_h0`, `p_ttest_h1`). The simulated outcomes (per iteration) for each of these p values will be separately stored in a dedicated column of the `data.frame` returned by the `sim` function. Optionally, the `fun_test` can return other miscellaneous outcomes too, such as effect sizes or confidence interval limits; these will then be stored in dedicated columns in the resulting `data.frame`.
`n_iter`	Number of iterations (default: 45000).
`adjust_n`	Adjust total number of observations via simple multiplication. Might be useful in some specific cases, e.g. if for some reason multiple p values are derived from the same sample without specifying grouping (`GRP` or `grp_` in `fun_test`), which would then lead to incorrect (too many, multiplied) totals; for example, in case of four observations obtained from the same sample, the value `1/4` could be given. (The default value is `1`.)
`seed`	Number for `set.seed`; `8` by default. Set to `NULL` for random seed.
`pair`	Logical or `NULL`. By default `NULL`, the algorithm assumes paired samples included among the observations in case of any grouping via the `fun_test` parameters ("`GRP`"/"`grp`"), and no paired samples otherwise. In case of paired samples included, within each look, the same vector indexes to remove elements from the given observations. In general, this should not substantially affect the outcomes of independent samples (assuming that their order is truly independent), but this depends on how the random samples are generated in the `fun_obs` function. To be safe and avoid any potential bias, it is best to avoid this paired sampling mechanism when no paired samples are included. To override the default, set to `TRUE` for paired samples scenario (paired sampling), or to `FALSE` for no paired samples scenario (random subsampling of each sample). (Might be useful for testing or some very specific procedures, e.g. where grouping is not indicated despite paired samples.)
`ignore_suffix`	Set to `NULL` to give warnings instead of errors for internally detected consistency problems with the `_h0`/`_h1` suffixes in the `fun_test` function arguments. Set to `TRUE` to completely ignore these (neither error nor warning). (Might be useful for testing or some very specific procedures.)
`prog_bar`	Logical, `FALSE` by default. If `TRUE`, shows progress bar.
`hush`	Logical, `FALSE` by default. If `TRUE`, prevents printing any details to console.

Details

To specify a variable that differs depending on whether the null hypothesis ("H0") or the alternative hypothesis ("H1") is true, a pair of samples are needed for fun_test, for which the argument names should have an identical root and "_h0" and "_h1" endings, such as "var_x_h0" (for sample in case of H0) and "var_x_h1" (for sample in case of H1). Then, since the observation number for this pair will always be the same, as a convenience, parameters with "_h0" and "_h1" endings specifically can be specified together in n_obs with the last "0"/"1" character dropped, hence ending with "_h". So, for example, "var_x_h = c(30, 60, 90)" will be automatically adjusted to specify the observation numbers for both "var_x_h0" and "var_x_h1". In that case, fun_obs must have a single argument "var_x_h", while fun_test must have both full names as arguments ("var_x_h0" and "var_x_h1").

Optionally, fun_obs can be provided in list format for the convenience of exploring varying factors (e.g., different effect sizes, correlations) at once, without writing a dedicated fun_obs function for each combination, and each time separately running the simulation and the power calculation. In this case, the first element of the list must be the actual function, which contains certain parameters for specifying varying factors, while the rest of the elements should contain the various argument values for these parameters of the function as named elements of the list (e.g., list(my_function, factor1=c(1, 2, 3), factor2=c(0, 5))), with the name corresponding to the parameter name in the function, and the varying values (numbers or strings). When so specified, a separate simulation procedure will be run for each combination of the given factors (or, if only one factor is given, for each element of that factor). The POSSA::pow function will be able to automatically detect (by default) the factors generated this way in the present POSSA::sim function, in order to calculate power separately for each factor combination.

Value

Returns a data.frame (with class "possa_sim_df") that includes the columns .iter (the iterations of the simulation procedure numbered from 1 to n_iter), .look (the interim "looks" numbered from 1 to the maximum number of looks, including the final one), and the information returned by the fun_test function for H0 and H1 outcomes (mainly p values; but also other, optional information, if any) and the corresponding observation numbers, as well as the total observation number per each look under a dedicated .n_total column. When this data frame is printed to the console (via POSSA's print() method), the head (first few lines) of the data is shown, as well as, in case of any varying factors included, summary information per factor combination.

Note

For the replicability (despite the randomization), set.seed is executed in the beginning of this function, each time it is called; see the seed parameter.

Examples


# below is a (very) minimal example
# for more, see the vignettes via https://github.com/gasparl/possa#usage

# create sampling function
customSample = function(sampleSize) {
    list(
        sample1 = rnorm(sampleSize, mean = 0, sd = 10),
        sample2_h0 = rnorm(sampleSize, mean = 0, sd = 10),
        sample2_h1 = rnorm(sampleSize, mean = 5, sd = 10)
    )
}

# create testing function
customTest = function(sample1, sample2_h0, sample2_h1) {
 c(
   p_h0 = t.test(sample1, sample2_h0, 'less', var.equal = TRUE)$p.value,
   p_h1 = t.test(sample1, sample2_h1, 'less', var.equal = TRUE)$p.value
 )
}

# run simulation
dfPvals = sim(
    fun_obs = customSample,
    n_obs = 80,
    fun_test = customTest,
    n_iter = 1000
)

# get power info
pow(dfPvals)

[Package POSSA version 0.6.4 Index]