R: Simulated Distribution-Free Bayesian Power and _t

dfba_bayes_vs_t_power {DFBA}

R Documentation

Simulated Distribution-Free Bayesian Power and t Power

Description

The function is a design tool for comparing Bayesian distribution-free power versus frequentist t power for a range of sample sizes. Allows for the stipulation of one of nine probability models for data generation.

Usage

dfba_bayes_vs_t_power(
  n_min = 20,
  delta,
  model,
  design,
  effect_crit = 0.95,
  shape1 = 1,
  shape2 = 1,
  samples = 1000,
  a0 = 1,
  b0 = 1,
  block_max = 0,
  hide_progress = FALSE
)

Arguments

`n_min`	Smallest desired value of sample size for power calculations (minimun 20; default is also 20)
`delta`	Offset amount between the two variates
`model`	Theoretical probability model for the data. One of `"normal"`, `"weibull"`, `"cauchy"`, `"lognormal"`, `"chisquare"`, `"logistic"`, `"exponential"`, `"gumbel"`, or `"pareto"`.
`design`	Indicates the data structure. One of `"independent"` or `"paired"`.
`effect_crit`	Stipulated value for a significant differences for a t-test (1 - p), and the critical probability for the Bayesian alternative hypothesis for a Bayesian distribution-free analysis
`shape1`	The shape parameter for the condition 1 variate for the distribution indicated by the `model` input (default is 1)
`shape2`	The shape parameter for the condition 2 variate for the distribution indicated by the `model` input (default is 1)
`samples`	Desired number of Monte Carlo data sets drawn to estimate the power (default is 1000)
`a0`	The first shape parameter for the prior beta distribution (default is 1). Must be positive and finite.
`b0`	The second shape parameter for the prior beta distribution (default is 1). Must be positive and finite.
`block_max`	The maximum size for a block effect (default is 0)
`hide_progress`	(Optional) If `TRUE`, hide percent progress while Monte Carlo sampling is running. (default is `FALSE`).

Details

Researchers need to make experimental-design decisions such as the choice about the sample size per condition and the decision of whether to use a within-block design or an independent-groups design. These planning issues arise regardless if one uses either a frequentist or a Bayesian approach to statistical inference. In the DFBA package, there are a number of functions to help users with these decisions. The dfba_bayes_vs_t_power() function produces (a) the Bayesian power estimate from a distribution-free analysis and (b) the corresponding frequentist power from a parametric t-test for a set of 11 sample sizes ranging from n_min to n_min + 50 in steps of 5. These estimates are based on a number of different Monte- Carlo-sampled data sets generated by the dfba_sim_data() function.

For each data set, statistical tests are performed. If design = "paired", the frequentist t-test is a one-tailed test on the within-block difference scores to assess the null hypothesis that the population mean for E is greater than the population mean for C; if design = "independent", the frequentist t-test is the one-tailed test to assess if there is a significant difference between the two independent conditions (i.e. if the mean for condition 2 is significantly greater than the condition 1 mean). If design = "paired", the Bayesian analysis assesses if the posterior probability for phi_w > .5 from the Bayesian Wilcoxon test is greater than effect_crit; if design = "independent", the Bayesian analysis assesses if the posterior probability for omega_E > .5 on a Bayesian Mann-Whitney test is greater than effect_crit. The frequentist power is estimated by the proportion of the data sets where a parametric t-test detects a significant effect because the upper-tail t value has a p-value less than 1-effect_crit. The Bayesian power is the proportion of the data sets where a posterior probability for the alternative hypothesis is greater than effect_crit. The default value for the effect_crit argument is effect_crit = .95. The frequentist p-value and the Bayesian posterior probability for the alternative hypothesis are calculated using the dfba_sim_data() function.

The arguments for the dfba_sim_data() function are passed from the dfba_bayes_vs_t_power() function. Besides the sample size n, there are eight other arguments that are required by the dfba_sim_data() function, which are passed from the dfba_bayes_vs_t_power() function:

a0
b0
model
design
delta
shape1
shape2
block_max.

The a0 and b0 values are the respective first and second beta shape parameters for the prior distribution needed for the Bayesian distribution-free tests, which are ultimately done by calling either the dfba_wilcoxon() function or by the dfba_mann_whitney() function.

The model argument is one of the following strings:

"normal"
"weibull"
"cauchy"
"lognormal"
"chisquare"
"logistic"
"exponential"
"gumbel"
"pareto"

The design argument is either "independent" or "paired", and stipulates whether the two sets of scores are either independent or from a common block such as for the case of two scores for the same person (i.e., one in each condition).

The shape1 and shape2 arguments are values for the shape parameter for the respective first and second condition, and their meaning depends on the probability model. For model="normal", these parameters are the standard deviations of the two distributions. For model = "weibull", the parameters are the Weibull shape parameters. For model = "cauchy", the parameters are the scale factors for the Cauchy distributions. For model = "lognormal", the shape parameters are the standard deviations for log(X). For model = "chisquare", the parameters are the degrees of freedom (df) for the two distributions. For model = "logistic", the parameters are the scale factors for the distributions. For model = "exponential", the parameters are the rate parameters for the distributions.

For the Gumbel distribution, the E variate is equal to delta - shape2*log(log(1/U)) where U is a random value sampled from the uniform distribution on the interval [.00001, .99999], and the C variate is equal to -shape1*log(log(1/U)) where U is another score sampled from the uniform distribution. The shape1 and shape2 arguments for model = "gumbel" are the scale parameters for the distributions. The Pareto model is a distribution designed to account for income distributions as studied by economists (Pareto, 1897). For the Pareto distribution, the cumulative function is equal to 1-(x_m/x)^alpha where x is greater than x_m (Arnold, 1983). In the E condition, x_m = 1 + delta and in the C condition x_m = 1. The alpha parameter is 1.16 times the shape parameters shape1 and shape2. Since the default value for each shape parameter is 1, the resulting alpha value of 1.16 is the default value. When alpha = 1.16, the Pareto distribution approximates an income distribution that represents the 80-20 law where 20% of the population receives 80% of the income (Hardy, 2010).

The block_max argument provides for incorporating block effects in the random sampling. The block effect for each score is a separate effect for the block. The block effect B for a score is a random number drawn from a uniform distribution on the interval [0, block_max]. When design = "paired", the same random block effect is added to the score in the first condition, which is the random C value, and it is also added to the corresponding paired value for the E variate. Thus, the pairing research design eliminates the effect of block variation for the assessment of condition differences. When design = "independent", there are different block-effect contributions to the E and C variates, which reduces the discrimination of condition differences because it increases the variability of the difference in the two variates. The user can study the effect of the relative discriminability of detecting an effect of delta by adjusting the value of the block_max argument. The default for block_max is 0, but it can be altered to any non-negative real number.

Value

A list containing the following components:

`nsims`	The number of Monte Carlo data sets; equal to the value of the `samples` argument
`model`	Probability model for the data
`design`	The design for the data; one of `"independent"` or `"paired"`
`effect_crit`	The criterion probability for considering a posterior probability for the hypothesis that `delta > 0` to be a detection; it is also `1 - p_crit` for a frequentist t-test
`deltav`	The offset between the variates; equal to the `delta` argument
`a0`	The first shape parameter for the beta prior distribution
`b0`	The second shape parameter for the beta prior distribution
`block_max`	The maximum size of a block effect; equal to `block_max` argument
`outputdf`	A dataframe of possible sample sizes and the corresponding Bayesian and frequentist power values

References

Arnold, B. C. (1983). Pareto Distribution. Fairland, MD: International Cooperative Publishing House.

Chechile, R. A. (2017). A Bayesian analysis for the Wilcoxon signed-rank statistic. Communications in Statistics - Theory and Methods, https://doi.org/10.1080/03610926.2017.1388402

Chechile, R. A. (2020). A Bayesian analysis for the Mann-Whitney statistic. Communications in Statistics - Theory and Methods, https://doi.org/10.1080/03610926.2018.1549247

Fishman, G. S. (1996) Monte Carlo: Concepts, Algorithms and Applications. New York: Springer.

Hardy, M. (2010). Pareto's Law. Mathematical Intelligencer, 32, 38-43.

Johnson, N. L., Kotz S., and Balakrishnan, N. (1995). Continuous Univariate Distributions, Vol. 1, New York: Wiley.

Pareto, V. (1897). Cours d'Economie Politique. Vol. 2, Lausanne: F. Rouge.

Examples


# Note: these examples have long runtimes due to Monte Carlo sampling;
# please feel free to run them in the console.

# Examples for two data sets sampled from standard normal distributions with
# no blocking effect


dfba_bayes_vs_t_power(n_min = 40,
                      delta = .45,
                      model = "normal",
                      design = "paired",
                      samples = 250,
                      hide_progress = TRUE)

dfba_bayes_vs_t_power(n_min = 50,
                      delta = .45,
                      model = "weibull",
                      design = "independent",
                      samples = 250,
                      hide_progress = TRUE)

dfba_bayes_vs_t_power(n_min = 50,
                      delta = .45,
                      model = "weibull",
                      design = "paired",
                      shape1 = .8,
                      shape2 = .8,
                      samples = 250,
                      block_max = 2.3,
                      hide_progress = TRUE)

[Package DFBA version 0.1.0 Index]