dfba_bivariate_concordance {DFBA} | R Documentation |

## Bayesian Distribution-Free Correlation and Concordance

### Description

Given bivariate data, computes the sample number of concordant changes `nc`

between the two variates and the number of discordant changes `nd`

.
Provides the frequentist `tau_A`

correlation coefficient
`(nc-nd)/(nc+nd)`

, and provides a Bayesian analysis of the population
concordance parameter `phi`

: the limit of the proportion of concordance
changes between the variates.
For goodness-of-fit applications, provides a concordance measure that
corrects for the number of fitting parameters.

### Usage

```
dfba_bivariate_concordance(
x,
y,
a0 = 1,
b0 = 1,
prob_interval = 0.95,
fitting.parameters = NULL
)
```

### Arguments

`x` |
Vector of x variable values |

`y` |
Vector of y variable values |

`a0` |
First shape parameter for the prior beta distribution (default is 1) |

`b0` |
Second shape parameter for the prior beta distribution (default is 1) |

`prob_interval` |
Desired width for interval estimates (default is .95) |

`fitting.parameters` |
(Optional) If either x or y values are generated by a predictive model, the number of free parameters in the model (default is NULL) |

### Details

The product-moment correlation depends on Gaussian assumptions about the
residuals in a regression analysis. It is not robust because it is strongly
influenced by any extreme outlier scores for either of the two variates. A
rank-based analysis can avoid both of these limitations. The `dfba_bivariate_concordance()`

function is focused on a nonparametric concordance metric for characterizing
the association between the two bivariate measures.

To illustrate the nonparametric concepts of concordance and discordance, consider a specific example where there are five paired scores with

`x = {3.8, 4.7, 4.7, 4.7, 11.8}`

and

`y = [5.9, -4.1, 7.3, 7.3, 38.9].`

The ranks for the `x`

variate are `1, 3, 3, 3, 5`

and the corresponding
ranks for `y`

are `2, 1, 3.5, 3.5, 5`

, so the five points in terms of
their ranks are `P_1 = (1, 2)`

, `P_2 = (3, 1)`

, `P_3 = (3, 3.5)`

,
`P_4 = (3, 3.5)`

and `P_5 = (5,5)`

. The relationship between any two
of these points *Pi* and *Pj*, is either (1) concordant if the
sign of `R_{xi} - R_{xj}`

is the same as the sign of
`R_{yi} - R_{yj}`

, (2) discordant if signs are
different between `R_{xi}-R_{xj}`

and `R_{yi}-R_{yj}`

, or (3) null if
either `R_{xi}=R_{xj}`

or if `R_{yi}=R_{yj}`

. For the above example,
there are ten possible comparisons among the five points; six are concordant,
one is discordant, and there are three comparisons lost due to ties. In
general, given `n`

bivariate scores there are `n(n-1)/2`

total
possible comparisons. When there are ties in the `x`

variate, there is
a loss of `T_x`

comparisons, when there are ties in the `y`

variate,
there are `T_y`

lost comparisons. Ties in both `x`

and `y`

are denoted
`T_{xy}`

. The total number of possible comparisons,
accounting for ties, is therefore:

`n(n-1)/2-T_x-T_y+T_{xy},`

where `T_{xy}`

is added to avoid double-counting of lost comparisons.

In the above example, there are three lost comparisons due to ties in `x`

,
one lost comparison due to a tie in `y`

, and one comparison lost to a tie
in both the `x`

and `y`

variates. Thus, there are `[(5*4)/2]-3-1+1=7`

comparisons for the above example. The `\tau_A`

correlation is defined as
`(n_c-n_d)/(n_c+n_d)`

, which is a value on the `[-1,1]`

interval. However,
it is important to note the original developer of the frequentist `\tau`

correlation used a different coefficient that has come to be called
`\tau_B`

, which is given as
`(n_c-n_d)/([(n*(n-1)/2)-Tx][(n*(n-1)/2)-Ty])^{.5}`

. However, `\tau_B`

does not properly correct for tied scores, which is unfortunate
because `\tau_B`

is the value returned by the `stats`

function
`cor(..., method = "kendall")`

. If there are no ties, then
`T_x = T_y = T_{xy} = 0`

and `\tau_A = \tau_B`

. But if there are ties,
then the proper coefficient is given by `\tau_A`

. The `dfba_bivariate_concordance()`

function provides the proper correction for tied scores and outputs a sample
estimate for `\tau_A`

.

The focus for the Bayesian analysis is on the population proportion
of concordance, which is the limit of the ratio `n_c/(n_c+n_d)`

. This
proportion is a value on the `[0,1]`

interval, and it is called `\phi`

(Phi).
`\phi`

is also connected to the population `\tau_A`

because
`\tau_A=(2\phi -1)`

. Moreover, Chechile (2020) showed that the
likelihood function for observing `n_c`

concordant changes and `n_d`

discordant changes is a censored Bernoulli process, so the likelihood is
proportional to `(\phi^{n_c})((1-\phi)^{n_d})`

. In Bayesian statistics, the
likelihood function is only specified as a proportional function because,
unlike in frequentist statistics, the likelihood of unobserved and more
extreme events are not computed. This idea is the *likelihood principle*,
and its violation can lead to paradoxes (Lindley & Phillips, 1976). Also, the
likelihood need only be a proportional function because the proportionality
constant appears in both the numerator and denominator of Bayes theorem, so
it cancels out. If the prior for `\phi`

is a beta distribution, then it
follows that the posterior is also a beta distribution (*i.e.*, the beta
is a natural Bayesian conjugate function for Bernoulli processes). The
default prior for the `dfba_bivariate_concordance()`

function is the flat prior (*i.e.*,
`a0 = 1`

and `b0 = 1`

).

In the special case where the user has a model for predicting a variate in
terms of known quantities and where there are free-fitting parameters, the
`dfba_bivariate_concordance()`

function's concordance parameter is a goodness-of-fit measure
for the scientific model. Thus, the bivariate pair are the observed value of
a variate along with the corresponding predicted score from the model. The
concordance proportion must be adjusted in these goodness-of-fit applications
to take into account the number of free parameters that were used
in the prediction model. Chechile and Barch (2021) argued that the fitting
parameters increases the number of concordant changes. Consequently, the
value for `n_c`

is downward-adjusted as a function of the number of free
parameters. The Chechile-Barch adjusted `n_c`

value for a case where there
are `m`

free fitting parameters is `n_c-(n*m)+[m*(m+1)/2]`

. As an example,
suppose that there are `n = 20`

scores, and the prediction equation has
`m = 2`

free parameters that result in creating a prediction for each
observed score (*i.e.*, there are 20 paired values of observed score `x`

and predicted score `y`

), and further suppose that this model results in
`n_c = 170`

and `n_d = 20`

. The value of `n_d`

is kept at 20, but
the number of concordant changes is reduced to `170-(20*2)+(2*3/2) = 133.`

### Value

A list containing the following components:

`tau` |
Nonparametric Tau-A correlation |

`sample_p` |
Sample concordance proportion |

`nc` |
Number of concordant comparisons |

`nd` |
Number of discordant comparisons |

`a_post` |
The first shape parameter for the posterior beta distribution for the concordance proportion |

`b_post` |
The second shape parameter for the posterior beta distribution for the concordance proportion |

`a0` |
The first shape parameter for the prior beta distribution for the concordance proportion |

`b0` |
The second shape parameter for the prior beta distribution for the concordance proportion |

`prob_interval` |
The probability within the interval estimates for the phi parameter |

`post_median` |
Median of posterior distribution on phi |

`eti_lower` |
Lower limit of the equal-tail interval with width specified by prob_interval |

`eti_upper` |
Upper limit of the equal-tail interval with width specified by prob_interval |

`tau_star` |
Corrected tau_A to account for the number of free fitting parameter in goodness-of-fit applications |

`nc_star` |
The corrected number of concordant comparisons for a goodness-of-fit application when there is an integer value for |

`nd_star` |
The number of discordant comparison when there is an integer value for |

`sample_p_star` |
Correct proportion of concordant comparisons to account for free-fitting parameter for goodness-of-fit applications |

`a_post_star` |
Corrected value for the first shape parameter for the posterior for the concordance proportion when there are free fitting parameter for goodness-of-fit applications |

`b_post_star` |
The second shape parameter for the posterior distribution for the concordance proportion when there is a goodness-of-fit application |

`post_median_star` |
The posterior median for the concordance proportion when there is a goodness-of-fit application |

`eti_lower_star` |
Lower limit for the interval estimate when there is a goodness-of-fit application |

`eti_upper_star` |
Upper limt for the interval estimate when there is a goodness-of-fit application |

### References

Chechile, R.A. (2020). Bayesian Statistics for Experimental Scientists: A General Introduction Using Distribution_Free Statistics. Cambridge: MIT Press.

Chechile, R.A., & Barch, D.H. (2021). A distribution-free, Bayesian goodness-of-fit method for assessing similar scientific prediction equations. Journal of Mathematical Psychology. https://doi.org/10.1016/j.jmp.2021.102638

Lindley, D. V., & Phillips, L. D. (1976). Inference for a Bernoulli process (a Bayesian view). The American Statistician, 30, 112-119.

### Examples

```
x <- c(47, 39, 47, 42, 44, 46, 39, 37, 29, 42, 54, 33, 44, 31, 28, 49, 32, 37, 46, 55, 31)
y <- c(36, 40, 49, 45, 30, 38, 39, 44, 27, 48, 49, 51, 27, 36, 30, 44, 42, 41, 35, 49, 33)
dfba_bivariate_concordance(x, y)
## A goodness-of-fit example for a hypothetical case of fitting data in a
## yobs vector with prediction model
p = seq(.05,.95,.05)
ypred= 17.332 - (50.261*p) + (48.308*p^2)
# Note the coefficients in the ypred equation were found first via a
# polynomial regression
yobs<-c(19.805, 10.105, 9.396, 8.219, 6.110, 4.543, 5.864, 4.861, 6.136,
5.789, 5.443, 5.548, 4.746, 6.484, 6.185, 6.202, 9.804, 9.332,
14.408)
dfba_bivariate_concordance(x = yobs,
y = ypred,
fitting.parameters = 3)
```

*DFBA*version 0.1.0 Index]