dfba_mann_whitney {DFBA} | R Documentation |

## Independent Samples Test (Mann Whitney U)

### Description

Given two independent vectors `E`

and `C`

, the function computes
the sample Mann-Whitney `U`

statistics `U_E`

and `U_C`

and
provides a Bayesian analysis for the population parameter `omega_E`

,
which is the population ratio of `U_E/(U_E+U_C)`

.

### Usage

```
dfba_mann_whitney(
E,
C,
a0 = 1,
b0 = 1,
prob_interval = 0.95,
samples = 30000,
method = NULL,
hide_progress = FALSE
)
```

### Arguments

`E` |
Data for independent sample 1 ("Experimental") |

`C` |
Data for independent sample 2 ("Control") |

`a0` |
The first shape parameter for the prior beta distribution for |

`b0` |
The second shape parameter for the prior beta distribution for |

`prob_interval` |
Desired probability value for the interval estimate for |

`samples` |
The number of Monte Carlo samples for |

`method` |
(Optional) The method option is either "small" or "large". The "small" algorithm is based on a discrete Monte Carlo solution for cases where n is typically less than 20. The "large" algorithm is based on beta approximation model for the posterior distribution for the omega_E parameter. This approximation is reasonable when n > 19. Regardless of |

`hide_progress` |
(Optional) If |

### Details

The Mann-Whitney *U* test is the frequentist nonparametric counterpart
to the independent-groups `t`

-test. The sample `U_E`

statistic is
the number of times that the *E* variate is larger than the
*C* variate, whereas `U_C`

is the converse number.

This test uses only rank information, so it is robust with respect to
outliers, and it does not depend on the assumption of a normal model for the
variates. The Bayesian version for the Mann-Whitney is focused on the
population parameter `omega_E`

, which is the population ratio
`U_E/(U_E+U_C)`

.

While the frequentist test effectively assumes the sharp null hypothesis that
`omega_E`

is .5, the Bayesian analysis has a prior and posterior
distribution for `omega_E`

on the [0, 1] interval. The prior is a beta
distribution with shape parameters `a0`

and `b0`

. The default is
the flat prior (`a0 = b0 =`

1), but this prior can be altered by the
user.

The `prob_interval`

input is the value for probability interval estimates for
omega_E. There are two cases depending on the sample size for the *E*
and *C* variates. When the samples sizes are small, there is a discrete
approximation method used. In this case, the Bayesian analysis considers 200
discrete values for `omega_E`

from .0025 to .9975 in steps of .005. For
each discrete value, a prior and a posterior probability are obtained. The
posterior probabilities are based on Monte Carlo sampling to approximate the
likelihood of obtaining the observed `U_E`

and `U_C`

values for each candidate
value for omega_E. For each candidate value for omega_E, the likelihood for
the observed sample U statistics does not depend on the true distributions of
the *E* and *C* variates in the population. For each candidate
`omega_E`

, the software constructs two exponential variates that have
the same omega_E value. The argument `samples`

specifies the number of
Monte Carlo samples used for each candidate value of `omega_E`

.

For large sample sizes of the *E* and *C* variates,
the Bayesian posterior distribution is closely approximated by a beta
distribution where the shape parameters are a function of the sample
`U_E`

and `U_C`

statistics. The large-sample beta approximation was
developed from extensive previous empirical studies designed to approximate
the quantiles of the discrete approach with the corresponding quantiles for a
particular beta distribution. The large-*n* solution also uses Lagrange
polynomials for interpolation. The large-*n* approximation is reasonably
accurate when `n > 19`

for each condition. When the `method`

input
is omitted, the function selects the appropriate procedure (*i.e.*,
either the discrete case for a small sample size or the large-*n*
approach). Nonetheless, the user can stipulate which method they desire
regardless of sample size by inputting either `method="small"`

or
`method="large"`

. The large-*n* solution is rapid compared
to the small-sample solution, so care should be executed when choosing the
`method="small"`

, even for large sample sizes.

Technical details of the analysis are explained in the Chechile (2020) Communications in Statistics paper cited below.

### Value

A list containing the following components:

`Emean` |
Mean of the independent sample 1 ("Experimental") data |

`Cmean` |
Mean of the independent sample 1 ("Control") data |

`n_E` |
Number of observations of the independent sample 1 ("Experimental") data |

`n_C` |
Mean of observations of the independent sample 2 ("Control") data |

`U_E` |
Total number of comparisons for which observations from independent sample 1 ("Experimental") data exceed observations from independent sample 2 ("Control") data) |

`U_C` |
Total number of comparisons for which observations from independent sample 2 ("Control") data exceed observations from independent sample 1 ("Experimental") data) |

`prob_interval` |
User-defined width of |

`a0` |
First shape parameter for the prior beta distribution |

`b0` |
Second shape parameter for the prior beta distribution |

`a_post` |
First shape parameter for the posterior beta distribution |

`b_post` |
Second shape parameter for the posterior beta distribution |

`samples` |
The number of desired Monte Carlo samples (default is 30000) |

`method` |
A character string indicating the calculation method used |

`omega_E` |
A vector of values representing candidate values for |

`omegapost` |
A vector of values representing discrete probabilities for candidate values of |

`priorvector` |
A vector of values representing prior discrete probabilities of candidate values of |

`priorprH1` |
Prior probability of the alternative model that omega_E exceeds 0.5 |

`prH1` |
Posterior probability of the alternative model that omega_E exceeds 0.5 |

`BF10` |
Bayes Factor describing the relative increase in the posterior odds for the alternative model that |

`omegabar` |
Posterior mean estimate for |

`eti_lower` |
Lower limit of the equal-tail probability interval for |

`eti_upper` |
Upper limit of the equal-tail probability interval for |

`hdi_lower` |
Lower limit of the highest-density probability interval for |

`hdi_upper` |
Upper limit of the highest-density probability interval for |

### References

Chechile, R.A. (2020). Bayesian Statistics for Experimental Scientists: A General Introduction Using Distribution-Free Methods. Cambridge: MIT Press.

Chechile, R.A. (2020). A Bayesian analysis for the Mann-Whitney statistic. Communications in Statistics – Theory and Methods 49(3): 670-696. https://doi.org/10.1080/03610926.2018.1549247.

### Examples

```
# Note: examples with method = "small" have long runtimes due to Monte Carlo
# sampling; please feel free to run them in the console.
# Examples with large n per group
# The data for each condition are presorted only for the user convenience if
# checking the U stats by hand
groupA <- c(43, 45, 47, 50, 54, 58, 60, 63, 69, 84, 85, 91, 99, 127, 130,
147, 165, 175, 193, 228, 252, 276)
groupB <- c(0, 01, 02, 03, 05, 14, 15, 23, 23, 25, 27, 32, 57, 105, 115, 158,
161, 181, 203, 290)
dfba_mann_whitney(E = groupA,
C = groupB)
# The following uses a Jeffreys prior instead of a default flat prior:
dfba_mann_whitney(E = groupA,
C = groupB,
a0 = .5,
b0 =.5)
# The following also uses a Jeffreys prior but the analysis reverses the
# variates:
dfba_mann_whitney(E = groupB,
C = groupA,
a0 = .5,
b0 = .5)
# Note that BF10 from the above analysis is 1/BF10 from the original order
# of the variates.
# The next analysis constructs 99% interval estimates with the Jeffreys
# prior.
AB <- dfba_mann_whitney(E = groupA,
C = groupB,
a0 = .5,
b0 = .5,
prob_interval=.99)
AB
# Plot with prior and posterior curves
plot(AB)
# Plot with posterior curve only
plot(AB,
plot.prior = FALSE)
# Example with small n per group
groupC <- c(96.49, 96.78, 97.26, 98.85, 99.75, 100.14, 101.15, 101.39,
102.58, 107.22, 107.70, 113.26)
groupD <- c(101.16, 102.09, 103.14, 104.70, 105.27, 108.22, 108.32, 108.51,
109.88, 110.32, 110.55, 113.42)
dfba_mann_whitney(E = groupC,
C = groupD,
samples = 250,
hide_progress = TRUE)
```

*DFBA*version 0.1.0 Index]