dfba_mann_whitney {DFBA} R Documentation

## Independent Samples Test (Mann Whitney U)

### Description

Given two independent vectors E and C, the function computes the sample Mann-Whitney U statistics U_E and U_C and provides a Bayesian analysis for the population parameter omega_E, which is the population ratio of U_E/(U_E+U_C).

### Usage

dfba_mann_whitney(
E,
C,
a0 = 1,
b0 = 1,
prob_interval = 0.95,
samples = 30000,
method = NULL,
hide_progress = FALSE
)


### Arguments

 E Data for independent sample 1 ("Experimental") C Data for independent sample 2 ("Control") a0 The first shape parameter for the prior beta distribution for omega_E (default is 1). Must be positive and finite. b0 The second shape parameter for the prior beta distribution for omega_E (default is 1). Must be positive and finite. prob_interval Desired probability value for the interval estimate for omega_E (default is 95%) samples The number of Monte Carlo samples for omega_E when method = "small" (default is 30000) method (Optional) The method option is either "small" or "large". The "small" algorithm is based on a discrete Monte Carlo solution for cases where n is typically less than 20. The "large" algorithm is based on beta approximation model for the posterior distribution for the omega_E parameter. This approximation is reasonable when n > 19. Regardless of n, the user can stipulate method. When the method argument is omitted, the program selects the appropriate procedure hide_progress (Optional) If TRUE, hide percent progress while Monte Carlo sampling is running when method = SMALL. (default is FALSE).

### Details

The Mann-Whitney U test is the frequentist nonparametric counterpart to the independent-groups t-test. The sample U_E statistic is the number of times that the E variate is larger than the C variate, whereas U_C is the converse number.

This test uses only rank information, so it is robust with respect to outliers, and it does not depend on the assumption of a normal model for the variates. The Bayesian version for the Mann-Whitney is focused on the population parameter omega_E, which is the population ratio U_E/(U_E+U_C).

While the frequentist test effectively assumes the sharp null hypothesis that omega_E is .5, the Bayesian analysis has a prior and posterior distribution for omega_E on the [0, 1] interval. The prior is a beta distribution with shape parameters a0 and b0. The default is the flat prior (a0 = b0 = 1), but this prior can be altered by the user.

The prob_interval input is the value for probability interval estimates for omega_E. There are two cases depending on the sample size for the E and C variates. When the samples sizes are small, there is a discrete approximation method used. In this case, the Bayesian analysis considers 200 discrete values for omega_E from .0025 to .9975 in steps of .005. For each discrete value, a prior and a posterior probability are obtained. The posterior probabilities are based on Monte Carlo sampling to approximate the likelihood of obtaining the observed U_E and U_C values for each candidate value for omega_E. For each candidate value for omega_E, the likelihood for the observed sample U statistics does not depend on the true distributions of the E and C variates in the population. For each candidate omega_E, the software constructs two exponential variates that have the same omega_E value. The argument samples specifies the number of Monte Carlo samples used for each candidate value of omega_E.

For large sample sizes of the E and C variates, the Bayesian posterior distribution is closely approximated by a beta distribution where the shape parameters are a function of the sample U_E and U_C statistics. The large-sample beta approximation was developed from extensive previous empirical studies designed to approximate the quantiles of the discrete approach with the corresponding quantiles for a particular beta distribution. The large-n solution also uses Lagrange polynomials for interpolation. The large-n approximation is reasonably accurate when n > 19 for each condition. When the method input is omitted, the function selects the appropriate procedure (i.e., either the discrete case for a small sample size or the large-n approach). Nonetheless, the user can stipulate which method they desire regardless of sample size by inputting either method="small" or method="large". The large-n solution is rapid compared to the small-sample solution, so care should be executed when choosing the method="small", even for large sample sizes.

Technical details of the analysis are explained in the Chechile (2020) Communications in Statistics paper cited below.

### Value

A list containing the following components:

 Emean Mean of the independent sample 1 ("Experimental") data Cmean Mean of the independent sample 1 ("Control") data n_E Number of observations of the independent sample 1 ("Experimental") data n_C Mean of observations of the independent sample 2 ("Control") data U_E Total number of comparisons for which observations from independent sample 1 ("Experimental") data exceed observations from independent sample 2 ("Control") data) U_C Total number of comparisons for which observations from independent sample 2 ("Control") data exceed observations from independent sample 1 ("Experimental") data) prob_interval User-defined width of omega_E interval estimate (default is 0.95) a0 First shape parameter for the prior beta distribution b0 Second shape parameter for the prior beta distribution a_post First shape parameter for the posterior beta distribution b_post Second shape parameter for the posterior beta distribution samples The number of desired Monte Carlo samples (default is 30000) method A character string indicating the calculation method used omega_E A vector of values representing candidate values for omega_E when method = "small" omegapost A vector of values representing discrete probabilities for candidate values of omega_E priorvector A vector of values representing prior discrete probabilities of candidate values of omega_E when method = "small" priorprH1 Prior probability of the alternative model that omega_E exceeds 0.5 prH1 Posterior probability of the alternative model that omega_E exceeds 0.5 BF10 Bayes Factor describing the relative increase in the posterior odds for the alternative model that omega_E exceeds 0.5 over the null model of omega_E less than or equal to 0.5 omegabar Posterior mean estimate for omega_E eti_lower Lower limit of the equal-tail probability interval for omega_E with probability width indicated by prob_interval eti_upper Upper limit of the equal-tail probability interval for omega_E with probability width indicated by prob_interval hdi_lower Lower limit of the highest-density probability interval for omega_E with probability width indicated by prob_interval when method = "small" hdi_upper Upper limit of the highest-density probability interval for omega_E with probability width indicated by prob_interval when method = "small"

### References

Chechile, R.A. (2020). Bayesian Statistics for Experimental Scientists: A General Introduction Using Distribution-Free Methods. Cambridge: MIT Press.

Chechile, R.A. (2020). A Bayesian analysis for the Mann-Whitney statistic. Communications in Statistics – Theory and Methods 49(3): 670-696. https://doi.org/10.1080/03610926.2018.1549247.

### Examples


# Note: examples with method = "small" have long runtimes due to Monte Carlo
# sampling; please feel free to run them in the console.

# Examples with large n per group
# The data for each condition are presorted only for the user convenience if
# checking the U stats by hand

groupA <- c(43, 45, 47, 50, 54, 58, 60, 63, 69, 84, 85, 91, 99, 127, 130,
147, 165, 175, 193, 228, 252, 276)
groupB <- c(0, 01, 02, 03, 05, 14, 15, 23, 23, 25, 27, 32, 57, 105, 115, 158,
161, 181, 203, 290)

dfba_mann_whitney(E = groupA,
C = groupB)

# The following uses a Jeffreys prior instead of a default flat prior:
dfba_mann_whitney(E = groupA,
C = groupB,
a0 = .5,
b0 =.5)

# The following also uses a Jeffreys prior but the analysis reverses the
# variates:
dfba_mann_whitney(E = groupB,
C = groupA,
a0 = .5,
b0 = .5)

# Note that BF10 from the above analysis is 1/BF10 from the original order
# of the variates.

# The next analysis constructs 99% interval estimates with the Jeffreys
# prior.

AB <- dfba_mann_whitney(E = groupA,
C = groupB,
a0 = .5,
b0 = .5,
prob_interval=.99)

AB

# Plot with prior and posterior curves
plot(AB)

# Plot with posterior curve only
plot(AB,
plot.prior = FALSE)

# Example with small n per group

groupC <- c(96.49, 96.78, 97.26, 98.85, 99.75, 100.14, 101.15, 101.39,
102.58, 107.22, 107.70, 113.26)
groupD <- c(101.16, 102.09, 103.14, 104.70, 105.27, 108.22, 108.32, 108.51,
109.88, 110.32, 110.55, 113.42)

dfba_mann_whitney(E = groupC,
C = groupD,
samples = 250,
hide_progress = TRUE)



[Package DFBA version 0.1.0 Index]