R: Difference of Two Proportions

props_neat {neatStats}

R Documentation

Difference of Two Proportions

Description

Comparison of paired and unpaired proportions. For unpaired: Pearson's chi-squared test or unconditional exact test, including confidence interval (CI) for the proportion difference, and corresponding independent multinomial contingency table Bayes factor (BF). (Cohen's h and its CI are also calculated.) For paired tests, classical (asymptotic) McNemar test (optionally with mid-P as well), including confidence interval (CI) for the proportion difference.

Usage

props_neat(
  var1 = NULL,
  var2 = NULL,
  case1 = NULL,
  case2 = NULL,
  control1 = NULL,
  control2 = NULL,
  prop1 = NULL,
  prop2 = NULL,
  n1 = NULL,
  n2 = NULL,
  pair = FALSE,
  greater = NULL,
  ci = NULL,
  bf_added = FALSE,
  round_to = 3,
  exact = FALSE,
  inverse = FALSE,
  yates = FALSE,
  midp = FALSE,
  h_added = FALSE,
  for_table = FALSE,
  hush = FALSE
)

Arguments

`var1`	First variable containing classifications, in 'group 1', for the first proportion (see Examples). If given (strictly necessary for paired proportions), proportions will be defined using `var1` and `var2` (see Details). To distinguish classification ('cases' and 'controls'; e.g. positive outcomes vs. negative outcomes), any two specific characters (or numbers) can be used. However, more than two different elements (apart from `NA`s) will cause error.
`var2`	Second variable containing classifications in, 'group 2', for the second proportion, analogously to `var1`.
`case1`	Number of 'cases' (as opposed to 'controls'; e.g. positive outcomes vs. negative outcomes) in 'group 1'. As counterpart, either control numbers or sample sizes needs to be given (see Details).
`case2`	Number of 'cases' in 'group 2'.
`control1`	Number of 'controls' in 'group 1'. As counterpart, case numbers need to be given (see Details).
`control2`	Number of 'controls' in 'group 2'.
`prop1`	Proportion in 'group 1'. As counterpart, sample sizes need to be given (see Details).
`prop2`	Proportion in 'group 2'.
`n1`	Number; sample size of 'group 1'.
`n2`	Number; sample size of 'group 2'.
`pair`	Logical. Set `TRUE` for paired proportions (McNemar, mid-P), or `FALSE` (default) for unpaired (chi squared, or unconditional exact test). Note: paired data must be given in `var1` and `var2`.
`greater`	`NULL` or string (or number); optionally specifies one-sided exact test: either "1" (`case1/n1` proportion expected to be greater than `case2/n2` proportion) or "2" (`case2/n2` proportion expected to be greater than `case1/n1` proportion). If `NULL` (default), the test is two-sided.
`ci`	Numeric; confidence level for the returned CIs (proportion difference and Cohen's h).
`bf_added`	Logical. If `TRUE`, Bayes factor is calculated and displayed. (Always two-sided!)
`round_to`	Number `to round` to the proportion statistics (difference and CIs).
`exact`	Logical, `FALSE` by default. If `TRUE`, `unconditional exact test` is calculated and displayed, otherwise the default Pearson's `chi-squared test`.
`inverse`	Logical, `FALSE` by default. When `var1` and `var2` are given to calculate proportion from, by default the factors' frequency determines which are 'cases' and which are 'controls' (so that the latter are more frequent). If the `inverse` argument is `TRUE`, it reverses the default proportion direction.
`yates`	Logical, `FALSE` by default. If `TRUE`, Yates' continuity correction is applied to the chi-squared (unpaired) or the McNemar (paired) test. Some authors advise this correction for certain specific cases (e.g., small sample), but evidence does not seem to support this (Pembury Smith & Ruxton, 2020).
`midp`	Logical, `FALSE` by default. If `TRUE`, displays an additional 'mid-P' p value (using the formula by Pembury Smith & Ruxton, 2020) for McNemar's test (Fagerland et al., 2013). This provides better control for Type I error (less false positive findings) than the classical McNemar test, while it is also probably not much less robust (Pembury Smith & Ruxton, 2020).
`h_added`	Logical. If `TRUE`, Cohen's h and its CI are calculated and displayed. (`FALSE` by default.)
`for_table`	Logical. If `TRUE`, omits the confidence level display from the printed text.
`hush`	Logical. If `TRUE`, prevents printing any details to console.

Details

The proportion for the two groups can be given using any of the following combinations (a) two vectors (var1 and var2), (b) cases and controls, (c) cases and sample sizes, or (d) proportions and sample sizes. Whenever multiple combinations are specified, only the first parameters (as given in the function and in the previous sentence) will be taken into account.

The Bayes factor (BF), in case of unpaired samples, is always calculated with the default r-scale of 0.707. BF supporting null hypothesis is denoted as BF01, while that supporting alternative hypothesis is denoted as BF10. When the BF is smaller than 1 (i.e., supports null hypothesis), the reciprocal is calculated (hence, BF10 = BF, but BF01 = 1/BF). When the BF is greater than or equal to 10000, scientific (exponential) form is reported for readability. (The original full BF number is available in the returned named vector as bf.)

Value

Prints exact test statistics (including proportion difference with CI, and BF) in APA style. Furthermore, when assigned, returns a named vector with the following elements: z (Z), p (p value), prop_diff (raw proportion difference), h (Cohen's h), bf (Bayes factor).

Note

Barnard's unconditional exact test is calculated via Exact::exact.test ("z-pooled").

The CI for the proportion difference in case of the exact test is calculated based on the p value, as described by Altman and Bland (2011). In case of extremely large or extremely small p values, this can be biased and misleading.

The Bayes factor is calculated via BayesFactor::contingencyTableBF, with sampleType = "indepMulti", as appropriate when both sample sizes (n1 and n2) are known in advance (as it normally happens). (For details, see contingencyTableBF, or e.g. 'Chapter 17 Bayesian statistics' in Navarro, 2019.)

References

Altman, D. G., & Bland, J. M. (2011). How to obtain the confidence interval from a P value. Bmj, 343(d2090). doi:10.1136/bmj.d2090

Barnard, G. A. (1947). Significance tests for 2x2 tables. Biometrika, 34(1/2), 123-138. doi:10.1093/biomet/34.1-2.123

Fagerland, M. W., Lydersen, S., & Laake, P. (2013). The McNemar test for binary matched-pairs data: Mid-p and asymptotic are better than exact conditional. BMC Medical Research Methodology, 13(1), 91. doi:10.1186/1471-2288-13-91

Lydersen, S., Fagerland, M. W., & Laake, P. (2009). Recommended tests for association in 2x2 tables. Statistics in medicine, 28(7), 1159-1175. doi:10.1002/sim.3531

Navarro, D. (2019). Learning statistics with R. https://learningstatisticswithr.com/

Pembury Smith, M. Q. R., & Ruxton, G. D. (2020). Effective use of the McNemar test. Behavioral Ecology and Sociobiology, 74(11), 133. doi:10.1007/s00265-020-02916-y

Suissa, S., & Shuster, J. J. (1985). Exact unconditional sample sizes for the 2 times 2 binomial trial. Journal of the Royal Statistical Society: Series A (General), 148(4), 317-327. doi:10.2307/2981892

Examples

# example data
set.seed(1)
outcomes_A = sample(c(rep('x', 490), rep('y', 10)))
outcomes_B = sample(c(rep('x', 400), rep('y', 100)))

# paired proportion test (McNemar)
props_neat(var1 = outcomes_A,
           var2 = outcomes_B,
           pair = TRUE)

# unpaired chi test for the same data (two independent samples assumed)
# Yates correction applied
# cf. http://www.sthda.com/english/wiki/two-proportions-z-test-in-r
props_neat(
    var1 = outcomes_A,
    var2 = outcomes_B,
    pair = FALSE,
    yates = TRUE
)

# above data given differently for unpaired test
# (no Yates corrrection)
props_neat(
    case1 = 490,
    case2 = 400,
    control1 = 10,
    control2 = 100
)

# again differently
props_neat(
    case1 = 490,
    case2 = 400,
    n1 = 500,
    n2 = 500
)

# other example data
outcomes_A2 = c(rep(1, 707), rep(0, 212),  rep(1, 256), rep(0, 144))
outcomes_B2 = c(rep(1, 707), rep(0, 212),  rep(0, 256), rep(1, 144))

# paired test
# cf. https://www.medcalc.org/manual/mcnemartest2.php
props_neat(var1 = outcomes_A2,
           var2 = outcomes_B2,
           pair = TRUE)

# show reverse proportions (otherwise the same)
props_neat(
    var1 = outcomes_A2,
    var2 = outcomes_B2,
    pair = TRUE,
    inverse = TRUE
)


# two different sample sizes
out_chi = props_neat(
    case1 = 40,
    case2 = 70,
    n1 = 150,
    n2 = 170
)

# exact test
out_exact = props_neat(
    case1 = 40,
    case2 = 70,
    n1 = 150,
    n2 = 170,
    exact = TRUE
)

# the two p values are just tiny bit different
print(out_chi) # p 0.00638942
print(out_exact) # p 0.006481884

# one-sided test
props_neat(
    case1 = 40,
    case2 = 70,
    n1 = 150,
    n2 = 170,
    greater = '2'
)

[Package neatStats version 1.13.3 Index]