R: Independent Samples Test (Mann Whitney U)

dfba_mann_whitney {DFBA}

R Documentation

Independent Samples Test (Mann Whitney U)

Description

Given two independent vectors E and C, the function computes the sample Mann-Whitney U statistics U_E and U_C and provides a Bayesian analysis for the population parameter omega_E, which is the population ratio of U_E/(U_E+U_C).

Usage

dfba_mann_whitney(
  E,
  C,
  a0 = 1,
  b0 = 1,
  prob_interval = 0.95,
  samples = 30000,
  method = NULL,
  hide_progress = FALSE
)

Arguments

`E`	Data for independent sample 1 ("Experimental")
`C`	Data for independent sample 2 ("Control")
`a0`	The first shape parameter for the prior beta distribution for `omega_E` (default is 1). Must be positive and finite.
`b0`	The second shape parameter for the prior beta distribution for `omega_E` (default is 1). Must be positive and finite.
`prob_interval`	Desired probability value for the interval estimate for `omega_E` (default is 95%)
`samples`	The number of Monte Carlo samples for `omega_E` when `method = "small"` (default is 30000)
`method`	(Optional) The method option is either "small" or "large". The "small" algorithm is based on a discrete Monte Carlo solution for cases where n is typically less than 20. The "large" algorithm is based on beta approximation model for the posterior distribution for the omega_E parameter. This approximation is reasonable when n > 19. Regardless of `n`, the user can stipulate `method`. When the `method` argument is omitted, the program selects the appropriate procedure
`hide_progress`	(Optional) If `TRUE`, hide percent progress while Monte Carlo sampling is running when `method = SMALL`. (default is `FALSE`).

Details

The Mann-Whitney U test is the frequentist nonparametric counterpart to the independent-groups t-test. The sample U_E statistic is the number of times that the E variate is larger than the C variate, whereas U_C is the converse number.

This test uses only rank information, so it is robust with respect to outliers, and it does not depend on the assumption of a normal model for the variates. The Bayesian version for the Mann-Whitney is focused on the population parameter omega_E, which is the population ratio U_E/(U_E+U_C).

While the frequentist test effectively assumes the sharp null hypothesis that omega_E is .5, the Bayesian analysis has a prior and posterior distribution for omega_E on the [0, 1] interval. The prior is a beta distribution with shape parameters a0 and b0. The default is the flat prior (a0 = b0 = 1), but this prior can be altered by the user.

The prob_interval input is the value for probability interval estimates for omega_E. There are two cases depending on the sample size for the E and C variates. When the samples sizes are small, there is a discrete approximation method used. In this case, the Bayesian analysis considers 200 discrete values for omega_E from .0025 to .9975 in steps of .005. For each discrete value, a prior and a posterior probability are obtained. The posterior probabilities are based on Monte Carlo sampling to approximate the likelihood of obtaining the observed U_E and U_C values for each candidate value for omega_E. For each candidate value for omega_E, the likelihood for the observed sample U statistics does not depend on the true distributions of the E and C variates in the population. For each candidate omega_E, the software constructs two exponential variates that have the same omega_E value. The argument samples specifies the number of Monte Carlo samples used for each candidate value of omega_E.

For large sample sizes of the E and C variates, the Bayesian posterior distribution is closely approximated by a beta distribution where the shape parameters are a function of the sample U_E and U_C statistics. The large-sample beta approximation was developed from extensive previous empirical studies designed to approximate the quantiles of the discrete approach with the corresponding quantiles for a particular beta distribution. The large-n solution also uses Lagrange polynomials for interpolation. The large-n approximation is reasonably accurate when n > 19 for each condition. When the method input is omitted, the function selects the appropriate procedure (i.e., either the discrete case for a small sample size or the large-n approach). Nonetheless, the user can stipulate which method they desire regardless of sample size by inputting either method="small" or method="large". The large-n solution is rapid compared to the small-sample solution, so care should be executed when choosing the method="small", even for large sample sizes.

Technical details of the analysis are explained in the Chechile (2020) Communications in Statistics paper cited below.

Value

A list containing the following components:

`Emean`	Mean of the independent sample 1 ("Experimental") data
`Cmean`	Mean of the independent sample 1 ("Control") data
`n_E`	Number of observations of the independent sample 1 ("Experimental") data
`n_C`	Mean of observations of the independent sample 2 ("Control") data
`U_E`	Total number of comparisons for which observations from independent sample 1 ("Experimental") data exceed observations from independent sample 2 ("Control") data)
`U_C`	Total number of comparisons for which observations from independent sample 2 ("Control") data exceed observations from independent sample 1 ("Experimental") data)
`prob_interval`	User-defined width of `omega_E` interval estimate (default is 0.95)
`a0`	First shape parameter for the prior beta distribution
`b0`	Second shape parameter for the prior beta distribution
`a_post`	First shape parameter for the posterior beta distribution
`b_post`	Second shape parameter for the posterior beta distribution
`samples`	The number of desired Monte Carlo samples (default is 30000)
`method`	A character string indicating the calculation method used
`omega_E`	A vector of values representing candidate values for `omega_E` when `method = "small"`
`omegapost`	A vector of values representing discrete probabilities for candidate values of `omega_E`
`priorvector`	A vector of values representing prior discrete probabilities of candidate values of `omega_E` when `method = "small"`
`priorprH1`	Prior probability of the alternative model that omega_E exceeds 0.5
`prH1`	Posterior probability of the alternative model that omega_E exceeds 0.5
`BF10`	Bayes Factor describing the relative increase in the posterior odds for the alternative model that `omega_E` exceeds 0.5 over the null model of `omega_E` less than or equal to 0.5
`omegabar`	Posterior mean estimate for `omega_E`
`eti_lower`	Lower limit of the equal-tail probability interval for `omega_E` with probability width indicated by `prob_interval`
`eti_upper`	Upper limit of the equal-tail probability interval for `omega_E` with probability width indicated by `prob_interval`
`hdi_lower`	Lower limit of the highest-density probability interval for `omega_E` with probability width indicated by `prob_interval` when `method = "small"`
`hdi_upper`	Upper limit of the highest-density probability interval for `omega_E` with probability width indicated by `prob_interval` when `method = "small"`

References

Chechile, R.A. (2020). Bayesian Statistics for Experimental Scientists: A General Introduction Using Distribution-Free Methods. Cambridge: MIT Press.

Chechile, R.A. (2020). A Bayesian analysis for the Mann-Whitney statistic. Communications in Statistics – Theory and Methods 49(3): 670-696. https://doi.org/10.1080/03610926.2018.1549247.

Examples


# Note: examples with method = "small" have long runtimes due to Monte Carlo
# sampling; please feel free to run them in the console.

# Examples with large n per group
# The data for each condition are presorted only for the user convenience if
# checking the U stats by hand

groupA <- c(43, 45, 47, 50, 54, 58, 60, 63, 69, 84, 85, 91, 99, 127, 130,
            147, 165, 175, 193, 228, 252, 276)
groupB <- c(0, 01, 02, 03, 05, 14, 15, 23, 23, 25, 27, 32, 57, 105, 115, 158,
            161, 181, 203, 290)

dfba_mann_whitney(E = groupA,
                  C = groupB)

# The following uses a Jeffreys prior instead of a default flat prior:
dfba_mann_whitney(E = groupA,
                  C = groupB,
                  a0 = .5,
                  b0 =.5)

# The following also uses a Jeffreys prior but the analysis reverses the
# variates:
dfba_mann_whitney(E = groupB,
                  C = groupA,
                  a0 = .5,
                  b0 = .5)

# Note that BF10 from the above analysis is 1/BF10 from the original order
# of the variates.

# The next analysis constructs 99% interval estimates with the Jeffreys
# prior.

AB <- dfba_mann_whitney(E = groupA,
                        C = groupB,
                        a0 = .5,
                        b0 = .5,
                        prob_interval=.99)

AB

# Plot with prior and posterior curves
plot(AB)

# Plot with posterior curve only
plot(AB,
     plot.prior = FALSE)

# Example with small n per group

groupC <- c(96.49, 96.78, 97.26, 98.85, 99.75, 100.14, 101.15, 101.39,
            102.58, 107.22, 107.70, 113.26)
groupD <- c(101.16, 102.09, 103.14, 104.70, 105.27, 108.22, 108.32, 108.51,
            109.88, 110.32, 110.55, 113.42)


dfba_mann_whitney(E = groupC,
                  C = groupD,
                  samples = 250,
                  hide_progress = TRUE)

[Package DFBA version 0.1.0 Index]