dfba_sim_data {DFBA} | R Documentation |
Simulated Data Generator and Inferential Comparison
Description
This function is designed to be called by other DFBA programs that compare frequentist and Bayesian power. The function generates simulated data for two conditions that can be from nine different probability models. The program also computes the frequentist p-value from a t-test on the generated data, and it computes the Bayesian posterior probability from a distribution-free analysis of the difference between the two conditions.
Usage
dfba_sim_data(
n = 20,
a0 = 1,
b0 = 1,
model,
design,
delta,
shape1 = 1,
shape2 = 1,
block_max = 0
)
Arguments
n |
Number of values per condition |
a0 |
The first shape parameter for the prior beta distribution (default is 1). Must be positive and finite. |
b0 |
The second shape parameter for the prior beta distribution (default is 1). Must be positive and finite. |
model |
Theoretical probability model for the data. One of |
design |
Indicates the data structure. One of |
delta |
Theoretical mean difference between conditions; the second condition minus the first condition |
shape1 |
The shape parameter for condition 1 for the distribution indicated by |
shape2 |
The shape parameter for condition 2 for the distribution indicated by |
block_max |
The maximum size for a block effect (default is 0) |
Details
Researchers need to make experimental-design decisions such as the choice about the sample size per condition and the decision to use a within-block design or an independent-group design. These planning issues arise regardless if one uses either a frequentist or Bayesian approach to statistical inference. In the DFBA package, there are a number of functions to help users with these decisions.
The dfba_sim_data()
program is used along with other functions to
assess the relative power for detecting a condition difference of an
amount delta between two conditions. Delta is an input for the
dfba_sim_data()
program, and it must be a nonnegative value.
Specifically, the dfba_sim_data()
program generates two sets of data
that are randomly drawn from one of nine different theoretical
models. The input ‘model’ stipulates the data generating probability
function. The input ‘model’ is one of the following strings:
-
"normal"
-
"weibull"
-
"cauchy"
-
"lognormal"
-
"chisquare"
-
"logistic"
-
"exponential"
-
"gumbel"
-
"pareto"
For each model there are n
continuous scores randomly sampled for each
condition, where n
is a user-specified input value. The design
argument is either "independent"
or "paired"
, and stipulates
whether the two sets of scores are either independent or from a common blocks
such as for the case of two scores for the same person (i.e., one in
each condition).
The shape1
and shape2
arguments are values for the shape parameter
for the respective first and second condition, and their meaning
depends on the probability model. For model="normal"
, these
parameters are the standard deviations of the two distributions. For
model = "weibull"
, the parameters are the Weibull shape parameters.
For model = "cauchy"
, the parameters are the scale factors for the
Cauchy distributions. For model = "lognormal"
, the shape
parameters are the standard deviations for log(X). For model = "chisquare"
,
the parameters are the degrees of freedom (df) for the two
distributions. For model = "logistic"
, the parameters are the scale
factors for the distributions. For model = "exponential"
, the parameters
are the rate parameters for the distributions.
For the Gumbel distribution, the E
variate is equal to
delta - shape2*log(log(1/U))
where U
is a random value sampled
from the uniform distribution on the interval [.00001, .99999]
, and
the C
variate is equal to -shape1*log(log(1/U))
where U
is another score sampled from the uniform distribution. The shape1
and
shape2
arguments for model = "gumbel"
are the scale parameters
for the distributions. The Pareto model is a distribution designed to account
for income distributions as studied by economists (Pareto, 1897). For the
Pareto distribution, the cumulative function is equal to 1-(x_m/x)^alpha
where x
is greater than x_m
(Arnold, 1983). In the E
condition, x_m = 1 + delta
and in the C
condition x_m = 1
.
The alpha parameter is 1.16 times the shape parameters shape1
and
shape2
. Since the default value for each shape parameter is 1, the
resulting alpha value of 1.16 is the default value. When alpha = 1.16, the
Pareto distribution approximates an income distribution that represents the
80-20 law where 20% of the population receives 80% of the income
(Hardy, 2010).
The block_max
argument provides for incorporating block effects in the
random sampling. The block effect for each score is a separate effect for the
block. The block effect B for a score is a random number drawn from a uniform
distribution on the interval [0, block_max]
. When design = "paired"
,
the same random block effect is added to the score in the first condition,
which is the random C
value, and it is also added to the corresponding
paired value for the E
variate. Thus, the pairing research design
eliminates the effect of block variation for the assessment of condition
differences. When design = "independent"
, there are different block-effect
contributions to the E
and C
variates, which reduces the
discrimination of condition differences because it increases the variability
of the difference in the two variates. The user can study the effect of the
relative discriminability of detecting an effect of delta by adjusting the
value of the block_max
argument. The default for block_max
is 0,
but it can be altered to any non-negative real number.
The output from calling the dfba_sim_data()
function are two
statistics that are based on the n scores generated in the two
conditions. One statistic is the frequentist p-value for rejecting the
null hypothesis that delta <= 0 from a parametric t-test. The
p-value is the upper tail probability of the sample t-statistic
for either the paired t-test when design = "paired"
or it is
the upper tail probability of the sample t-statistic for the two-group
t-test when design = "independent"
. The second output statistic
is the Bayesian posterior probability for one of two possible nonparametric
tests. If design = "paired"
, the dfba_sim_sim()
function
calls the dfba_wilcoxon()
function to ascertain the posterior
probability that phi_w > .5
. If design = "independent"
, the
dfba_sim_data()
function calls the dfba_mann_whitney()
function
to estimate the posterior probability that omega_E > .5
. The arguments
a0
and b0
for the dfba_sim_data()
function are passed
along to either the dfba_wilcoxon()
function or the
dfba_mann_whitney()
function. The default values are a0 = b0 = 1
.
Value
A list containing the following components:
pvalue |
The upper tail of the sample t value for the test that delta <= 0 |
prH1 |
Bayesian posterior probability either for the hypothesis that phi_w > .5 from the nonparametric Wilcoxon test when |
C |
Vector of length n of simulated values for condition 1 |
E |
Vector of length n of simulated values for condition 2 |
design |
The data structure indicated by the |
Note
Random sampling for both the Gumbel and the Pareto distributions are
generated by the dfba_sim_data()
function using the inverse transform
method (Fishman, 1996).
References
Arnold, B. C. (1983). Pareto Distribution. Fairland, MD: International Cooperative Publishing House.
Chechile, R. A. (2017). A Bayesian analysis for the Wilcoxon signed-rank statistic. Communications in Statistics - Theory and Methods, https://doi.org/10.1080/03610926.2017.1388402.
Chechile, R. A. (2020). A Bayesian analysis for the Mann- Whitney statistic. Communications in Statistics - Theory and Methods, https://doi.org/10.1080/03610926.2018.1549247.
Fishman, G. S. (1996) Monte Carlo: Concepts, Algorithms and Applications. New York: Springer.
Hardy, M. (2010). Pareto's Law. Mathematical Intelligencer, 32, 38-43.
Johnson, N. L., Kotz S., and Balakrishnan, N. (1995). Continuous Univariate Distributions, Vol. 1, New York: Wiley.
Pareto, V. (1897). Cours d'Economie Politique. Vol. 2, Lausanne: F. Rouge.
See Also
Distributions
for details on the
parameters of the normal, Weibull, Cauchy, lognormal, chi-squared, logistic,
and exponential distributions.
Examples
# Example of two paired normal distributions where the s.d. of the two
# conditions are 1 and 4.
dfba_sim_data(n = 50,
model = "normal",
design = "paired",
delta = .4,
shape1 = 1,
shape2 = 4)
# Example of two independent Weibull variates with their shape parameters =.8
# and with a .25 offset
dfba_sim_data(n = 80,
model = "weibull",
design = "independent",
delta = .25,
shape1 = .8,
shape2 = .8)
# Example of two independent Weibull variates with their shape
# parameters = .8 and with a .25 offset along with some block differences
# with the max block effect being 1.5
dfba_sim_data(n = 80,
model = "weibull",
design = "independent",
delta = .25,
shape1 = .8,
shape2 = .8,
block_max = 1.5)
# Example of two paired Cauchy variates with a .4 offset
dfba_sim_data(n = 50,
model = "cauchy",
design = "paired",
delta = .4)
# Example of two paired Cauchy variates with a .4 offset where the Bayesian
# analysis uses the Jeffreys prior
dfba_sim_data(n = 50,
a0 = .5,
b0 = .5,
model = "cauchy",
design = "paired",
delta=.4)