dfba_bivariate_concordance {DFBA} | R Documentation |
Bayesian Distribution-Free Correlation and Concordance
Description
Given bivariate data, computes the sample number of concordant changes nc
between the two variates and the number of discordant changes nd
.
Provides the frequentist tau_A
correlation coefficient
(nc-nd)/(nc+nd)
, and provides a Bayesian analysis of the population
concordance parameter phi
: the limit of the proportion of concordance
changes between the variates.
For goodness-of-fit applications, provides a concordance measure that
corrects for the number of fitting parameters.
Usage
dfba_bivariate_concordance(
x,
y,
a0 = 1,
b0 = 1,
prob_interval = 0.95,
fitting.parameters = NULL
)
Arguments
x |
Vector of x variable values |
y |
Vector of y variable values |
a0 |
First shape parameter for the prior beta distribution (default is 1) |
b0 |
Second shape parameter for the prior beta distribution (default is 1) |
prob_interval |
Desired width for interval estimates (default is .95) |
fitting.parameters |
(Optional) If either x or y values are generated by a predictive model, the number of free parameters in the model (default is NULL) |
Details
The product-moment correlation depends on Gaussian assumptions about the
residuals in a regression analysis. It is not robust because it is strongly
influenced by any extreme outlier scores for either of the two variates. A
rank-based analysis can avoid both of these limitations. The dfba_bivariate_concordance()
function is focused on a nonparametric concordance metric for characterizing
the association between the two bivariate measures.
To illustrate the nonparametric concepts of concordance and discordance, consider a specific example where there are five paired scores with
and
The ranks for the variate are
and the corresponding
ranks for
are
, so the five points in terms of
their ranks are
,
,
,
and
. The relationship between any two
of these points Pi and Pj, is either (1) concordant if the
sign of
is the same as the sign of
, (2) discordant if signs are
different between
and
, or (3) null if
either
or if
. For the above example,
there are ten possible comparisons among the five points; six are concordant,
one is discordant, and there are three comparisons lost due to ties. In
general, given
bivariate scores there are
total
possible comparisons. When there are ties in the
variate, there is
a loss of
comparisons, when there are ties in the
variate,
there are
lost comparisons. Ties in both
and
are denoted
. The total number of possible comparisons,
accounting for ties, is therefore:
where
is added to avoid double-counting of lost comparisons.
In the above example, there are three lost comparisons due to ties in ,
one lost comparison due to a tie in
, and one comparison lost to a tie
in both the
and
variates. Thus, there are
comparisons for the above example. The
correlation is defined as
, which is a value on the
interval. However,
it is important to note the original developer of the frequentist
correlation used a different coefficient that has come to be called
, which is given as
. However,
does not properly correct for tied scores, which is unfortunate
because
is the value returned by the
stats
function
cor(..., method = "kendall")
. If there are no ties, then
and
. But if there are ties,
then the proper coefficient is given by
. The
dfba_bivariate_concordance()
function provides the proper correction for tied scores and outputs a sample
estimate for .
The focus for the Bayesian analysis is on the population proportion
of concordance, which is the limit of the ratio . This
proportion is a value on the
interval, and it is called
(Phi).
is also connected to the population
because
. Moreover, Chechile (2020) showed that the
likelihood function for observing
concordant changes and
discordant changes is a censored Bernoulli process, so the likelihood is
proportional to
. In Bayesian statistics, the
likelihood function is only specified as a proportional function because,
unlike in frequentist statistics, the likelihood of unobserved and more
extreme events are not computed. This idea is the likelihood principle,
and its violation can lead to paradoxes (Lindley & Phillips, 1976). Also, the
likelihood need only be a proportional function because the proportionality
constant appears in both the numerator and denominator of Bayes theorem, so
it cancels out. If the prior for
is a beta distribution, then it
follows that the posterior is also a beta distribution (i.e., the beta
is a natural Bayesian conjugate function for Bernoulli processes). The
default prior for the
dfba_bivariate_concordance()
function is the flat prior (i.e.,
a0 = 1
and b0 = 1
).
In the special case where the user has a model for predicting a variate in
terms of known quantities and where there are free-fitting parameters, the
dfba_bivariate_concordance()
function's concordance parameter is a goodness-of-fit measure
for the scientific model. Thus, the bivariate pair are the observed value of
a variate along with the corresponding predicted score from the model. The
concordance proportion must be adjusted in these goodness-of-fit applications
to take into account the number of free parameters that were used
in the prediction model. Chechile and Barch (2021) argued that the fitting
parameters increases the number of concordant changes. Consequently, the
value for is downward-adjusted as a function of the number of free
parameters. The Chechile-Barch adjusted
value for a case where there
are
free fitting parameters is
. As an example,
suppose that there are
scores, and the prediction equation has
free parameters that result in creating a prediction for each
observed score (i.e., there are 20 paired values of observed score
x
and predicted score y
), and further suppose that this model results in
and
. The value of
n_d
is kept at 20, but
the number of concordant changes is reduced to
Value
A list containing the following components:
tau |
Nonparametric Tau-A correlation |
sample_p |
Sample concordance proportion |
nc |
Number of concordant comparisons |
nd |
Number of discordant comparisons |
a_post |
The first shape parameter for the posterior beta distribution for the concordance proportion |
b_post |
The second shape parameter for the posterior beta distribution for the concordance proportion |
a0 |
The first shape parameter for the prior beta distribution for the concordance proportion |
b0 |
The second shape parameter for the prior beta distribution for the concordance proportion |
prob_interval |
The probability within the interval estimates for the phi parameter |
post_median |
Median of posterior distribution on phi |
eti_lower |
Lower limit of the equal-tail interval with width specified by prob_interval |
eti_upper |
Upper limit of the equal-tail interval with width specified by prob_interval |
tau_star |
Corrected tau_A to account for the number of free fitting parameter in goodness-of-fit applications |
nc_star |
The corrected number of concordant comparisons for a goodness-of-fit application when there is an integer value for |
nd_star |
The number of discordant comparison when there is an integer value for |
sample_p_star |
Correct proportion of concordant comparisons to account for free-fitting parameter for goodness-of-fit applications |
a_post_star |
Corrected value for the first shape parameter for the posterior for the concordance proportion when there are free fitting parameter for goodness-of-fit applications |
b_post_star |
The second shape parameter for the posterior distribution for the concordance proportion when there is a goodness-of-fit application |
post_median_star |
The posterior median for the concordance proportion when there is a goodness-of-fit application |
eti_lower_star |
Lower limit for the interval estimate when there is a goodness-of-fit application |
eti_upper_star |
Upper limt for the interval estimate when there is a goodness-of-fit application |
References
Chechile, R.A. (2020). Bayesian Statistics for Experimental Scientists: A General Introduction Using Distribution_Free Statistics. Cambridge: MIT Press.
Chechile, R.A., & Barch, D.H. (2021). A distribution-free, Bayesian goodness-of-fit method for assessing similar scientific prediction equations. Journal of Mathematical Psychology. https://doi.org/10.1016/j.jmp.2021.102638
Lindley, D. V., & Phillips, L. D. (1976). Inference for a Bernoulli process (a Bayesian view). The American Statistician, 30, 112-119.
Examples
x <- c(47, 39, 47, 42, 44, 46, 39, 37, 29, 42, 54, 33, 44, 31, 28, 49, 32, 37, 46, 55, 31)
y <- c(36, 40, 49, 45, 30, 38, 39, 44, 27, 48, 49, 51, 27, 36, 30, 44, 42, 41, 35, 49, 33)
dfba_bivariate_concordance(x, y)
## A goodness-of-fit example for a hypothetical case of fitting data in a
## yobs vector with prediction model
p = seq(.05,.95,.05)
ypred= 17.332 - (50.261*p) + (48.308*p^2)
# Note the coefficients in the ypred equation were found first via a
# polynomial regression
yobs<-c(19.805, 10.105, 9.396, 8.219, 6.110, 4.543, 5.864, 4.861, 6.136,
5.789, 5.443, 5.548, 4.746, 6.484, 6.185, 6.202, 9.804, 9.332,
14.408)
dfba_bivariate_concordance(x = yobs,
y = ypred,
fitting.parameters = 3)