ciTableProp {EnvStats}  R Documentation 
Create a table of confidence intervals for probability of "success" for a binomial distribution or the difference between two proportions following Bacchetti (2010), by varying the estimated proportion or differene between the two estimated proportions given the sample size(s).
ciTableProp(n1 = 10, p1.hat = c(0.1, 0.2, 0.3), n2 = n1,
p2.hat.minus.p1.hat = c(0.2, 0.1, 0), sample.type = "two.sample",
ci.type = "two.sided", conf.level = 0.95, digits = 2, ci.method = "score",
correct = TRUE, tol = 10^(digits + 1))
n1 
positive integer greater than 1 specifying the sample size when 
p1.hat 
numeric vector of values between 0 and 1 indicating the estimated proportion
( 
n2 
positive integer greater than 1 specifying the sample size for group 2 when

p2.hat.minus.p1.hat 
numeric vector indicating the assumed difference between the two sample proportions
when 
sample.type 
character string specifying whether to create confidence intervals for the difference
between two proportions ( 
ci.type 
character string indicating what kind of confidence interval to compute. The
possible values are 
conf.level 
a scalar between 0 and 1 indicating the confidence level of the confidence interval.
The default value is 
digits 
positive integer indicating how many decimal places to display in the table. The
default value is 
ci.method 
character string indicating the method to use to construct the confidence interval.
The default value is 
correct 
logical scalar indicating whether to use the correction for continuity when 
tol 
numeric scalar indicating how close the values of the adjusted elements of 
OneSample Case (sample.type="one.sample"
)
For the onesample case, the function ciTableProp
calls the R function
prop.test
when
ci.method="score"
, and calls the R function
binom.test
, when ci.method="exact"
. To ensure that the
usersupplied values of p1.hat
are valid for the given usersupplied values
of n1
, values for the argument x
to the function
prop.test
or binom.test
are computed using the formula
x < unique(round((p1.hat * n1), 0))
and the argument p.hat
is then adjusted using the formula
p.hat < x/n1
TwoSample Case (sample.type="two.sample"
)
For the twosample case, the function ciTableProp
calls the R function
prop.test
. To ensure that the usersupplied values of p1.hat
are valid for the given usersupplied values of n1
, the values for the
first component of the argument x
to the function
prop.test
are computed using the formula
x1 < unique(round((p1.hat * n1), 0))
and the argument p1.hat
is then adjusted using the formula
p1.hat < x1/n1
Next, the estimated proportions from group 2 are computed by adding together all
possible combinations from the elements of p1.hat
and
p2.hat.minus.p1.hat
. These estimated proportions from group 2 are then
adjusted using the formulas:
x2.rep < round((p2.hat.rep * n2), 0)
p2.hat.rep < x2.rep/n2
If any of these adjusted proportions from group 2 are \le 0
or \ge 1
the function terminates with a message indicating that impossible
values have been supplied.
In cases where the sample sizes are small there may be instances where the
usersupplied values of p1.hat
and/or p2.hat.minus.p1.hat
are not
attainable. The argument tol
is used to determine whether to return
the table in conventional form or whether it is necessary to modify the table
to include twice as many columns (see EXAMPLES section below).
a data frame with elements that are character strings indicating the confidence intervals.
When sample.type="two.sample"
, a data frame with the rows varying
the estimated proportion for group 1 (i.e., the values of p1.hat
) and
the columns varying the estimated difference between the proportions from
group 2 and group 1 (i.e., the values of p2.hat.minus.p1.hat
). In cases
where the sample sizes are small, it may not be possible to obtain certain
differences for given values of p1.hat
, in which case the returned
data frame contains twice as many columns indicating the actual difference
in one column and the compute confidence interval next to it (see EXAMPLES
section below).
When sample.type="one.sample"
, a 1row data frame with the columns
varying the estimated proportion (i.e., the values of p1.hat
).
Bacchetti (2010) presents strong arguments against the current convention in scientific research for computing sample size that is based on formulas that use a fixed Type I error (usually 5%) and a fixed minimal power (often 80%) without regard to costs. He notes that a key input to these formulas is a measure of variability (usually a standard deviation) that is difficult to measure accurately "unless there is so much preliminary data that the study isn't really needed." Also, study designers often avoid defining what a scientifically meaningful difference is by presenting sample size results in terms of the effect size (i.e., the difference of interest divided by the elusive standard deviation). Bacchetti (2010) encourages study designers to use simple tables in a sensitivity analysis to see what results of a study may look like for low, moderate, and high rates of variability and large, intermediate, and no underlying differences in the populations or processes being studied.
Steven P. Millard (EnvStats@ProbStatInfo.com)
Bacchetti, P. (2010). Current sample size conventions: Flaws, Harms, and Alternatives. BMC Medicine 8, 17–23.
Also see the references in the help files for prop.test
and
binom.test
.
prop.test
, binom.test
, ciTableMean
,
ciBinomHalfWidth
, ciBinomN
,
plotCiBinomDesign
.
# Reproduce Table 1 in Bacchetti (2010). This involves planning a study with
# n1 = n2 = 935 subjects per group, where Group 1 is the control group and
# Group 2 is the treatment group. The outcome in the study is proportion of
# subjects with serious outcomes or death. A negative value for the difference
# in proportions between groups (Group 2 proportion  Group 1 proportion)
# indicates the treatment group has a better outcome. In this table, the
# proportion of subjects in Group 1 with serious outcomes or death is set
# to 3%, 6.5%, and 12%, and the difference in proportions between the two
# groups is set to 2.8 percentage points, 1.4 percentage points, and 0.
ciTableProp(n1 = 935, p1.hat = c(0.03, 0.065, 0.12), n2 = 935,
p2.hat.minus.p1.hat = c(0.028, 0.014, 0), digits = 3)
# Diff=0.028 Diff=0.014 Diff=0
#P1.hat=0.030 [0.040, 0.015] [0.029, 0.001] [0.015, 0.015]
#P1.hat=0.065 [0.049, 0.007] [0.036, 0.008] [0.022, 0.022]
#P1.hat=0.120 [0.057, 0.001] [0.044, 0.016] [0.029, 0.029]
#==========
# Show how the returned data frame has to be modified for cases of small
# sample sizes where not all usersupplied differenes are possible.
ciTableProp(n1 = 5, n2 = 5, p1.hat = c(0.3, 0.6, 0.12), p2.hat = c(0.2, 0.1, 0))
# Diff CI Diff CI Diff CI
#P1.hat=0.4 0.2 [0.61, 1.00] 0.0 [0.61, 0.61] 0 [0.61, 0.61]
#P1.hat=0.6 0.2 [0.55, 0.95] 0.2 [0.55, 0.95] 0 [0.61, 0.61]
#P1.hat=0.2 0.2 [0.55, 0.95] 0.2 [0.55, 0.95] 0 [0.50, 0.50]
#==========
# Suppose we are planning a study to compare the proportion of nondetects at
# a background and downgradient well, and we can use ciTableProp to look how
# the confidence interval for the difference between the two proportions using
# say 36 quarterly samples at each well varies with the observed estimated
# proportions. Here we'll let the argument "p1.hat" denote the proportion of
# nondetects observed at the downgradient well and set this equal to
# 20%, 40% and 60%. The argument "p2.hat.minus.p1.hat" represents the proportion
# of nondetects at the background well minus the proportion of nondetects at the
# downgradient well.
ciTableProp(n1 = 36, p1.hat = c(0.2, 0.4, 0.6), n2 = 36,
p2.hat.minus.p1.hat = c(0.3, 0.15, 0))
# Diff=0.31 Diff=0.14 Diff=0
#P1.hat=0.19 [ 0.07, 0.54] [0.09, 0.37] [0.18, 0.18]
#P1.hat=0.39 [ 0.06, 0.55] [0.12, 0.39] [0.23, 0.23]
#P1.hat=0.61 [ 0.09, 0.52] [0.10, 0.38] [0.23, 0.23]
# We see that even if the observed difference in the proportion of nondetects
# is about 15 percentage points, all of the confidence intervals for the
# difference between the proportions of nondetects at the two wells contain 0,
# so if a difference of 15 percentage points is important to substantiate, we
# may need to increase our sample sizes.