dfba_mann_whitney {DFBA}R Documentation

Independent Samples Test (Mann Whitney U)

Description

Given two independent vectors E and C, the function computes the sample Mann-Whitney U statistics U_E and U_C and provides a Bayesian analysis for the population parameter omega_E, which is the population ratio of U_E/(U_E+U_C).

Usage

dfba_mann_whitney(
  E,
  C,
  a0 = 1,
  b0 = 1,
  prob_interval = 0.95,
  samples = 30000,
  method = NULL,
  hide_progress = FALSE
)

Arguments

E

Data for independent sample 1 ("Experimental")

C

Data for independent sample 2 ("Control")

a0

The first shape parameter for the prior beta distribution for omega_E (default is 1). Must be positive and finite.

b0

The second shape parameter for the prior beta distribution for omega_E (default is 1). Must be positive and finite.

prob_interval

Desired probability value for the interval estimate for omega_E (default is 95%)

samples

The number of Monte Carlo samples for omega_E when method = "small" (default is 30000)

method

(Optional) The method option is either "small" or "large". The "small" algorithm is based on a discrete Monte Carlo solution for cases where n is typically less than 20. The "large" algorithm is based on beta approximation model for the posterior distribution for the omega_E parameter. This approximation is reasonable when n > 19. Regardless of n, the user can stipulate method. When the method argument is omitted, the program selects the appropriate procedure

hide_progress

(Optional) If TRUE, hide percent progress while Monte Carlo sampling is running when method = SMALL. (default is FALSE).

Details

The Mann-Whitney U test is the frequentist nonparametric counterpart to the independent-groups t-test. The sample U_E statistic is the number of times that the E variate is larger than the C variate, whereas U_C is the converse number.

This test uses only rank information, so it is robust with respect to outliers, and it does not depend on the assumption of a normal model for the variates. The Bayesian version for the Mann-Whitney is focused on the population parameter omega_E, which is the population ratio U_E/(U_E+U_C).

While the frequentist test effectively assumes the sharp null hypothesis that omega_E is .5, the Bayesian analysis has a prior and posterior distribution for omega_E on the [0, 1] interval. The prior is a beta distribution with shape parameters a0 and b0. The default is the flat prior (a0 = b0 = 1), but this prior can be altered by the user.

The prob_interval input is the value for probability interval estimates for omega_E. There are two cases depending on the sample size for the E and C variates. When the samples sizes are small, there is a discrete approximation method used. In this case, the Bayesian analysis considers 200 discrete values for omega_E from .0025 to .9975 in steps of .005. For each discrete value, a prior and a posterior probability are obtained. The posterior probabilities are based on Monte Carlo sampling to approximate the likelihood of obtaining the observed U_E and U_C values for each candidate value for omega_E. For each candidate value for omega_E, the likelihood for the observed sample U statistics does not depend on the true distributions of the E and C variates in the population. For each candidate omega_E, the software constructs two exponential variates that have the same omega_E value. The argument samples specifies the number of Monte Carlo samples used for each candidate value of omega_E.

For large sample sizes of the E and C variates, the Bayesian posterior distribution is closely approximated by a beta distribution where the shape parameters are a function of the sample U_E and U_C statistics. The large-sample beta approximation was developed from extensive previous empirical studies designed to approximate the quantiles of the discrete approach with the corresponding quantiles for a particular beta distribution. The large-n solution also uses Lagrange polynomials for interpolation. The large-n approximation is reasonably accurate when n > 19 for each condition. When the method input is omitted, the function selects the appropriate procedure (i.e., either the discrete case for a small sample size or the large-n approach). Nonetheless, the user can stipulate which method they desire regardless of sample size by inputting either method="small" or method="large". The large-n solution is rapid compared to the small-sample solution, so care should be executed when choosing the method="small", even for large sample sizes.

Technical details of the analysis are explained in the Chechile (2020) Communications in Statistics paper cited below.

Value

A list containing the following components:

Emean

Mean of the independent sample 1 ("Experimental") data

Cmean

Mean of the independent sample 1 ("Control") data

n_E

Number of observations of the independent sample 1 ("Experimental") data

n_C

Mean of observations of the independent sample 2 ("Control") data

U_E

Total number of comparisons for which observations from independent sample 1 ("Experimental") data exceed observations from independent sample 2 ("Control") data)

U_C

Total number of comparisons for which observations from independent sample 2 ("Control") data exceed observations from independent sample 1 ("Experimental") data)

prob_interval

User-defined width of omega_E interval estimate (default is 0.95)

a0

First shape parameter for the prior beta distribution

b0

Second shape parameter for the prior beta distribution

a_post

First shape parameter for the posterior beta distribution

b_post

Second shape parameter for the posterior beta distribution

samples

The number of desired Monte Carlo samples (default is 30000)

method

A character string indicating the calculation method used

omega_E

A vector of values representing candidate values for omega_E when method = "small"

omegapost

A vector of values representing discrete probabilities for candidate values of omega_E

priorvector

A vector of values representing prior discrete probabilities of candidate values of omega_E when method = "small"

priorprH1

Prior probability of the alternative model that omega_E exceeds 0.5

prH1

Posterior probability of the alternative model that omega_E exceeds 0.5

BF10

Bayes Factor describing the relative increase in the posterior odds for the alternative model that omega_E exceeds 0.5 over the null model of omega_E less than or equal to 0.5

omegabar

Posterior mean estimate for omega_E

eti_lower

Lower limit of the equal-tail probability interval for omega_E with probability width indicated by prob_interval

eti_upper

Upper limit of the equal-tail probability interval for omega_E with probability width indicated by prob_interval

hdi_lower

Lower limit of the highest-density probability interval for omega_E with probability width indicated by prob_interval when method = "small"

hdi_upper

Upper limit of the highest-density probability interval for omega_E with probability width indicated by prob_interval when method = "small"

References

Chechile, R.A. (2020). Bayesian Statistics for Experimental Scientists: A General Introduction Using Distribution-Free Methods. Cambridge: MIT Press.

Chechile, R.A. (2020). A Bayesian analysis for the Mann-Whitney statistic. Communications in Statistics – Theory and Methods 49(3): 670-696. https://doi.org/10.1080/03610926.2018.1549247.

Examples


# Note: examples with method = "small" have long runtimes due to Monte Carlo
# sampling; please feel free to run them in the console.

# Examples with large n per group
# The data for each condition are presorted only for the user convenience if
# checking the U stats by hand

groupA <- c(43, 45, 47, 50, 54, 58, 60, 63, 69, 84, 85, 91, 99, 127, 130,
            147, 165, 175, 193, 228, 252, 276)
groupB <- c(0, 01, 02, 03, 05, 14, 15, 23, 23, 25, 27, 32, 57, 105, 115, 158,
            161, 181, 203, 290)

dfba_mann_whitney(E = groupA,
                  C = groupB)

# The following uses a Jeffreys prior instead of a default flat prior:
dfba_mann_whitney(E = groupA,
                  C = groupB,
                  a0 = .5,
                  b0 =.5)

# The following also uses a Jeffreys prior but the analysis reverses the
# variates:
dfba_mann_whitney(E = groupB,
                  C = groupA,
                  a0 = .5,
                  b0 = .5)

# Note that BF10 from the above analysis is 1/BF10 from the original order
# of the variates.

# The next analysis constructs 99% interval estimates with the Jeffreys
# prior.

AB <- dfba_mann_whitney(E = groupA,
                        C = groupB,
                        a0 = .5,
                        b0 = .5,
                        prob_interval=.99)

AB

# Plot with prior and posterior curves
plot(AB)

# Plot with posterior curve only
plot(AB,
     plot.prior = FALSE)

# Example with small n per group

groupC <- c(96.49, 96.78, 97.26, 98.85, 99.75, 100.14, 101.15, 101.39,
            102.58, 107.22, 107.70, 113.26)
groupD <- c(101.16, 102.09, 103.14, 104.70, 105.27, 108.22, 108.32, 108.51,
            109.88, 110.32, 110.55, 113.42)


dfba_mann_whitney(E = groupC,
                  C = groupD,
                  samples = 250,
                  hide_progress = TRUE)




[Package DFBA version 0.1.0 Index]