dfba_sim_data {DFBA}  R Documentation 
Simulated Data Generator and Inferential Comparison
Description
This function is designed to be called by other DFBA programs that compare frequentist and Bayesian power. The function generates simulated data for two conditions that can be from nine different probability models. The program also computes the frequentist pvalue from a ttest on the generated data, and it computes the Bayesian posterior probability from a distributionfree analysis of the difference between the two conditions.
Usage
dfba_sim_data(
n = 20,
a0 = 1,
b0 = 1,
model,
design,
delta,
shape1 = 1,
shape2 = 1,
block_max = 0
)
Arguments
n 
Number of values per condition 
a0 
The first shape parameter for the prior beta distribution (default is 1). Must be positive and finite. 
b0 
The second shape parameter for the prior beta distribution (default is 1). Must be positive and finite. 
model 
Theoretical probability model for the data. One of 
design 
Indicates the data structure. One of 
delta 
Theoretical mean difference between conditions; the second condition minus the first condition 
shape1 
The shape parameter for condition 1 for the distribution indicated by 
shape2 
The shape parameter for condition 2 for the distribution indicated by 
block_max 
The maximum size for a block effect (default is 0) 
Details
Researchers need to make experimentaldesign decisions such as the choice about the sample size per condition and the decision to use a withinblock design or an independentgroup design. These planning issues arise regardless if one uses either a frequentist or Bayesian approach to statistical inference. In the DFBA package, there are a number of functions to help users with these decisions.
The dfba_sim_data()
program is used along with other functions to
assess the relative power for detecting a condition difference of an
amount delta between two conditions. Delta is an input for the
dfba_sim_data()
program, and it must be a nonnegative value.
Specifically, the dfba_sim_data()
program generates two sets of data
that are randomly drawn from one of nine different theoretical
models. The input ‘model’ stipulates the data generating probability
function. The input ‘model’ is one of the following strings:

"normal"

"weibull"

"cauchy"

"lognormal"

"chisquare"

"logistic"

"exponential"

"gumbel"

"pareto"
For each model there are n
continuous scores randomly sampled for each
condition, where n
is a userspecified input value. The design
argument is either "independent"
or "paired"
, and stipulates
whether the two sets of scores are either independent or from a common blocks
such as for the case of two scores for the same person (i.e., one in
each condition).
The shape1
and shape2
arguments are values for the shape parameter
for the respective first and second condition, and their meaning
depends on the probability model. For model="normal"
, these
parameters are the standard deviations of the two distributions. For
model = "weibull"
, the parameters are the Weibull shape parameters.
For model = "cauchy"
, the parameters are the scale factors for the
Cauchy distributions. For model = "lognormal"
, the shape
parameters are the standard deviations for log(X). For model = "chisquare"
,
the parameters are the degrees of freedom (df) for the two
distributions. For model = "logistic"
, the parameters are the scale
factors for the distributions. For model = "exponential"
, the parameters
are the rate parameters for the distributions.
For the Gumbel distribution, the E
variate is equal to
delta  shape2*log(log(1/U))
where U
is a random value sampled
from the uniform distribution on the interval [.00001, .99999]
, and
the C
variate is equal to shape1*log(log(1/U))
where U
is another score sampled from the uniform distribution. The shape1
and
shape2
arguments for model = "gumbel"
are the scale parameters
for the distributions. The Pareto model is a distribution designed to account
for income distributions as studied by economists (Pareto, 1897). For the
Pareto distribution, the cumulative function is equal to 1(x_m/x)^alpha
where x
is greater than x_m
(Arnold, 1983). In the E
condition, x_m = 1 + delta
and in the C
condition x_m = 1
.
The alpha parameter is 1.16 times the shape parameters shape1
and
shape2
. Since the default value for each shape parameter is 1, the
resulting alpha value of 1.16 is the default value. When alpha = 1.16, the
Pareto distribution approximates an income distribution that represents the
8020 law where 20% of the population receives 80% of the income
(Hardy, 2010).
The block_max
argument provides for incorporating block effects in the
random sampling. The block effect for each score is a separate effect for the
block. The block effect B for a score is a random number drawn from a uniform
distribution on the interval [0, block_max]
. When design = "paired"
,
the same random block effect is added to the score in the first condition,
which is the random C
value, and it is also added to the corresponding
paired value for the E
variate. Thus, the pairing research design
eliminates the effect of block variation for the assessment of condition
differences. When design = "independent"
, there are different blockeffect
contributions to the E
and C
variates, which reduces the
discrimination of condition differences because it increases the variability
of the difference in the two variates. The user can study the effect of the
relative discriminability of detecting an effect of delta by adjusting the
value of the block_max
argument. The default for block_max
is 0,
but it can be altered to any nonnegative real number.
The output from calling the dfba_sim_data()
function are two
statistics that are based on the n scores generated in the two
conditions. One statistic is the frequentist pvalue for rejecting the
null hypothesis that delta <= 0 from a parametric ttest. The
pvalue is the upper tail probability of the sample tstatistic
for either the paired ttest when design = "paired"
or it is
the upper tail probability of the sample tstatistic for the twogroup
ttest when design = "independent"
. The second output statistic
is the Bayesian posterior probability for one of two possible nonparametric
tests. If design = "paired"
, the dfba_sim_sim()
function
calls the dfba_wilcoxon()
function to ascertain the posterior
probability that phi_w > .5
. If design = "independent"
, the
dfba_sim_data()
function calls the dfba_mann_whitney()
function
to estimate the posterior probability that omega_E > .5
. The arguments
a0
and b0
for the dfba_sim_data()
function are passed
along to either the dfba_wilcoxon()
function or the
dfba_mann_whitney()
function. The default values are a0 = b0 = 1
.
Value
A list containing the following components:
pvalue 
The upper tail of the sample t value for the test that delta <= 0 
prH1 
Bayesian posterior probability either for the hypothesis that phi_w > .5 from the nonparametric Wilcoxon test when 
C 
Vector of length n of simulated values for condition 1 
E 
Vector of length n of simulated values for condition 2 
design 
The data structure indicated by the 
Note
Random sampling for both the Gumbel and the Pareto distributions are
generated by the dfba_sim_data()
function using the inverse transform
method (Fishman, 1996).
References
Arnold, B. C. (1983). Pareto Distribution. Fairland, MD: International Cooperative Publishing House.
Chechile, R. A. (2017). A Bayesian analysis for the Wilcoxon signedrank statistic. Communications in Statistics  Theory and Methods, https://doi.org/10.1080/03610926.2017.1388402.
Chechile, R. A. (2020). A Bayesian analysis for the Mann Whitney statistic. Communications in Statistics  Theory and Methods, https://doi.org/10.1080/03610926.2018.1549247.
Fishman, G. S. (1996) Monte Carlo: Concepts, Algorithms and Applications. New York: Springer.
Hardy, M. (2010). Pareto's Law. Mathematical Intelligencer, 32, 3843.
Johnson, N. L., Kotz S., and Balakrishnan, N. (1995). Continuous Univariate Distributions, Vol. 1, New York: Wiley.
Pareto, V. (1897). Cours d'Economie Politique. Vol. 2, Lausanne: F. Rouge.
See Also
Distributions
for details on the
parameters of the normal, Weibull, Cauchy, lognormal, chisquared, logistic,
and exponential distributions.
Examples
# Example of two paired normal distributions where the s.d. of the two
# conditions are 1 and 4.
dfba_sim_data(n = 50,
model = "normal",
design = "paired",
delta = .4,
shape1 = 1,
shape2 = 4)
# Example of two independent Weibull variates with their shape parameters =.8
# and with a .25 offset
dfba_sim_data(n = 80,
model = "weibull",
design = "independent",
delta = .25,
shape1 = .8,
shape2 = .8)
# Example of two independent Weibull variates with their shape
# parameters = .8 and with a .25 offset along with some block differences
# with the max block effect being 1.5
dfba_sim_data(n = 80,
model = "weibull",
design = "independent",
delta = .25,
shape1 = .8,
shape2 = .8,
block_max = 1.5)
# Example of two paired Cauchy variates with a .4 offset
dfba_sim_data(n = 50,
model = "cauchy",
design = "paired",
delta = .4)
# Example of two paired Cauchy variates with a .4 offset where the Bayesian
# analysis uses the Jeffreys prior
dfba_sim_data(n = 50,
a0 = .5,
b0 = .5,
model = "cauchy",
design = "paired",
delta=.4)