dfba_bivariate_concordance {DFBA}R Documentation

Bayesian Distribution-Free Correlation and Concordance

Description

Given bivariate data, computes the sample number of concordant changes nc between the two variates and the number of discordant changes nd. Provides the frequentist tau_A correlation coefficient (nc-nd)/(nc+nd), and provides a Bayesian analysis of the population concordance parameter phi: the limit of the proportion of concordance changes between the variates. For goodness-of-fit applications, provides a concordance measure that corrects for the number of fitting parameters.

Usage

dfba_bivariate_concordance(
  x,
  y,
  a0 = 1,
  b0 = 1,
  prob_interval = 0.95,
  fitting.parameters = NULL
)

Arguments

x

Vector of x variable values

y

Vector of y variable values

a0

First shape parameter for the prior beta distribution (default is 1)

b0

Second shape parameter for the prior beta distribution (default is 1)

prob_interval

Desired width for interval estimates (default is .95)

fitting.parameters

(Optional) If either x or y values are generated by a predictive model, the number of free parameters in the model (default is NULL)

Details

The product-moment correlation depends on Gaussian assumptions about the residuals in a regression analysis. It is not robust because it is strongly influenced by any extreme outlier scores for either of the two variates. A rank-based analysis can avoid both of these limitations. The dfba_bivariate_concordance() function is focused on a nonparametric concordance metric for characterizing the association between the two bivariate measures.

To illustrate the nonparametric concepts of concordance and discordance, consider a specific example where there are five paired scores with

x=3.8,4.7,4.7,4.7,11.8x = {3.8, 4.7, 4.7, 4.7, 11.8}

and

y=[5.9,4.1,7.3,7.3,38.9].y = [5.9, -4.1, 7.3, 7.3, 38.9].

The ranks for the xx variate are 1,3,3,3,51, 3, 3, 3, 5 and the corresponding ranks for yy are 2,1,3.5,3.5,52, 1, 3.5, 3.5, 5, so the five points in terms of their ranks are P1=(1,2)P_1 = (1, 2), P2=(3,1)P_2 = (3, 1), P3=(3,3.5)P_3 = (3, 3.5), P4=(3,3.5)P_4 = (3, 3.5) and P5=(5,5)P_5 = (5,5). The relationship between any two of these points Pi and Pj, is either (1) concordant if the sign of RxiRxjR_{xi} - R_{xj} is the same as the sign of RyiRyjR_{yi} - R_{yj}, (2) discordant if signs are different between RxiRxjR_{xi}-R_{xj} and RyiRyjR_{yi}-R_{yj}, or (3) null if either Rxi=RxjR_{xi}=R_{xj} or if Ryi=RyjR_{yi}=R_{yj}. For the above example, there are ten possible comparisons among the five points; six are concordant, one is discordant, and there are three comparisons lost due to ties. In general, given nn bivariate scores there are n(n1)/2n(n-1)/2 total possible comparisons. When there are ties in the xx variate, there is a loss of TxT_x comparisons, when there are ties in the yy variate, there are TyT_y lost comparisons. Ties in both xx and yy are denoted TxyT_{xy}. The total number of possible comparisons, accounting for ties, is therefore:

n(n1)/2TxTy+Txy,n(n-1)/2-T_x-T_y+T_{xy},

where TxyT_{xy} is added to avoid double-counting of lost comparisons.

In the above example, there are three lost comparisons due to ties in xx, one lost comparison due to a tie in yy, and one comparison lost to a tie in both the xx and yy variates. Thus, there are [(54)/2]31+1=7[(5*4)/2]-3-1+1=7 comparisons for the above example. The τA\tau_A correlation is defined as (ncnd)/(nc+nd)(n_c-n_d)/(n_c+n_d), which is a value on the [1,1][-1,1] interval. However, it is important to note the original developer of the frequentist τ\tau correlation used a different coefficient that has come to be called τB\tau_B, which is given as (ncnd)/([(n(n1)/2)Tx][(n(n1)/2)Ty]).5(n_c-n_d)/([(n*(n-1)/2)-Tx][(n*(n-1)/2)-Ty])^{.5}. However, τB\tau_B does not properly correct for tied scores, which is unfortunate because τB\tau_B is the value returned by the stats function cor(..., method = "kendall"). If there are no ties, then Tx=Ty=Txy=0T_x = T_y = T_{xy} = 0 and τA=τB\tau_A = \tau_B. But if there are ties, then the proper coefficient is given by τA\tau_A. The dfba_bivariate_concordance() function provides the proper correction for tied scores and outputs a sample estimate for τA\tau_A.

The focus for the Bayesian analysis is on the population proportion of concordance, which is the limit of the ratio nc/(nc+nd)n_c/(n_c+n_d). This proportion is a value on the [0,1][0,1] interval, and it is called ϕ\phi (Phi). ϕ\phi is also connected to the population τA\tau_A because τA=(2ϕ1)\tau_A=(2\phi -1). Moreover, Chechile (2020) showed that the likelihood function for observing ncn_c concordant changes and ndn_d discordant changes is a censored Bernoulli process, so the likelihood is proportional to (ϕnc)((1ϕ)nd)(\phi^{n_c})((1-\phi)^{n_d}). In Bayesian statistics, the likelihood function is only specified as a proportional function because, unlike in frequentist statistics, the likelihood of unobserved and more extreme events are not computed. This idea is the likelihood principle, and its violation can lead to paradoxes (Lindley & Phillips, 1976). Also, the likelihood need only be a proportional function because the proportionality constant appears in both the numerator and denominator of Bayes theorem, so it cancels out. If the prior for ϕ\phi is a beta distribution, then it follows that the posterior is also a beta distribution (i.e., the beta is a natural Bayesian conjugate function for Bernoulli processes). The default prior for the dfba_bivariate_concordance() function is the flat prior (i.e., a0 = 1 and b0 = 1).

In the special case where the user has a model for predicting a variate in terms of known quantities and where there are free-fitting parameters, the dfba_bivariate_concordance() function's concordance parameter is a goodness-of-fit measure for the scientific model. Thus, the bivariate pair are the observed value of a variate along with the corresponding predicted score from the model. The concordance proportion must be adjusted in these goodness-of-fit applications to take into account the number of free parameters that were used in the prediction model. Chechile and Barch (2021) argued that the fitting parameters increases the number of concordant changes. Consequently, the value for ncn_c is downward-adjusted as a function of the number of free parameters. The Chechile-Barch adjusted ncn_c value for a case where there are mm free fitting parameters is nc(nm)+[m(m+1)/2]n_c-(n*m)+[m*(m+1)/2]. As an example, suppose that there are n=20n = 20 scores, and the prediction equation has m=2m = 2 free parameters that result in creating a prediction for each observed score (i.e., there are 20 paired values of observed score x and predicted score y), and further suppose that this model results in nc=170n_c = 170 and nd=20n_d = 20. The value of n_d is kept at 20, but the number of concordant changes is reduced to 170(202)+(23/2)=133.170-(20*2)+(2*3/2) = 133.

Value

A list containing the following components:

tau

Nonparametric Tau-A correlation

sample_p

Sample concordance proportion

nc

Number of concordant comparisons

nd

Number of discordant comparisons

a_post

The first shape parameter for the posterior beta distribution for the concordance proportion

b_post

The second shape parameter for the posterior beta distribution for the concordance proportion

a0

The first shape parameter for the prior beta distribution for the concordance proportion

b0

The second shape parameter for the prior beta distribution for the concordance proportion

prob_interval

The probability within the interval estimates for the phi parameter

post_median

Median of posterior distribution on phi

eti_lower

Lower limit of the equal-tail interval with width specified by prob_interval

eti_upper

Upper limit of the equal-tail interval with width specified by prob_interval

tau_star

Corrected tau_A to account for the number of free fitting parameter in goodness-of-fit applications

nc_star

The corrected number of concordant comparisons for a goodness-of-fit application when there is an integer value for fitting.parameters

nd_star

The number of discordant comparison when there is an integer value for fitting.parameters

sample_p_star

Correct proportion of concordant comparisons to account for free-fitting parameter for goodness-of-fit applications

a_post_star

Corrected value for the first shape parameter for the posterior for the concordance proportion when there are free fitting parameter for goodness-of-fit applications

b_post_star

The second shape parameter for the posterior distribution for the concordance proportion when there is a goodness-of-fit application

post_median_star

The posterior median for the concordance proportion when there is a goodness-of-fit application

eti_lower_star

Lower limit for the interval estimate when there is a goodness-of-fit application

eti_upper_star

Upper limt for the interval estimate when there is a goodness-of-fit application

References

Chechile, R.A. (2020). Bayesian Statistics for Experimental Scientists: A General Introduction Using Distribution_Free Statistics. Cambridge: MIT Press.

Chechile, R.A., & Barch, D.H. (2021). A distribution-free, Bayesian goodness-of-fit method for assessing similar scientific prediction equations. Journal of Mathematical Psychology. https://doi.org/10.1016/j.jmp.2021.102638

Lindley, D. V., & Phillips, L. D. (1976). Inference for a Bernoulli process (a Bayesian view). The American Statistician, 30, 112-119.

Examples



x <- c(47, 39, 47, 42, 44, 46, 39, 37, 29, 42, 54, 33, 44, 31, 28, 49, 32, 37, 46, 55, 31)
y <- c(36, 40, 49, 45, 30, 38, 39, 44, 27, 48, 49, 51, 27, 36, 30, 44, 42, 41, 35, 49, 33)

dfba_bivariate_concordance(x, y)

## A goodness-of-fit example for a hypothetical case of fitting data in a
## yobs vector with prediction model

p = seq(.05,.95,.05)
ypred= 17.332 - (50.261*p) + (48.308*p^2)

# Note the coefficients in the ypred equation were found first via a
# polynomial regression

yobs<-c(19.805, 10.105, 9.396, 8.219, 6.110, 4.543, 5.864, 4.861, 6.136,
         5.789,  5.443, 5.548, 4.746, 6.484, 6.185, 6.202, 9.804, 9.332,
         14.408)

dfba_bivariate_concordance(x = yobs,
         y = ypred,
         fitting.parameters = 3)


[Package DFBA version 0.1.0 Index]