dfba_gamma {DFBA} | R Documentation |
Goodman-Kruskal Gamma
Description
Given bivariate data in the form of either a rank-ordered table or a matrix, returns the number of concordant and discordant changes between the variates, the Goodman-Kruskal gamma statistic, and a Bayesian analysis of the population concordance proportion parameter phi.
Usage
dfba_gamma(x, a0 = 1, b0 = 1, prob_interval = 0.95)
Arguments
x |
Cross-tabulated matrix or table where cell [I, J] represents the frequency of observations where the rank of measure 1 is I and the rank of measure 2 is J. |
a0 |
First shape parameter for the prior beta distribution (default is 1) |
b0 |
Second shape parameter for the prior beta distribution (default is 1) |
prob_interval |
Desired width for interval estimates (default is 0.95) |
Details
For bivariate data where two measures are restricted on an ordinal scale,
such as when the two variates are ranked data over a limited set of integers,
then an ordered contingency table is often a convenient data representation.
For such a case the element in the [I, J]
cell of the matrix is the
frequency of occasions where one variate has a rank value of I
and the
corresponding rank for the other variate is J
. This situation is a
special case of the more general case where there are two continuous
bivariate measures. For the special case of a rank-order matrix with
frequencies, there is a distribution-free concordance correlation that is in
common usage: Goodman and Kruskal's gamma G
(Siegel & Castellan, 1988).
Chechile (2020) showed that Goodman and Kruskal's gamma is equivalent to the
more general \tau_A
nonparametric correlation coefficient.
Historically, gamma was considered a different metric from \tau
because
typically the version of \tau
in standard use was \tau_B
, which
is a flawed metric because it does not properly correct for ties. Note:
cor(... ,method = "kendall")
returns the \tau_B
correlation, which
is incorrect when there are ties. The correct \tau_A
is computed by the
dfba_bivariate_concordance()
function.
The gamma statistic is equal to (n_c-n_d)/(n_c+n_d)
, where n_c
is
the number of occasions when the variates change in a concordant way and n_d
is the number of occasions when the variates change in a discordant fashion.
The value of n_c
for an order matrix is the sum of terms for each [I, J]
that are equal to n_{ij}N^{+}_{ij}
, where n_{ij}
is the frequency
for cell [I, J]
and N^{+}_{ij}
is the sum of a frequencies in the
matrix where the row value is greater than I
and where the column value is
greater than J
. The value n_d
is the sum of terms for each [I, J]
that
are n_{ij}N^{-}_{ij}
, where N^{-}_{ij}
is the sum of the frequencies
in the matrix where row value is greater than I
and the column value is
less than J
. The n_c
and n_d
values computed in this fashion
are, respectively, equal to n_c
and n_d
values found when the bivariate
measures are entered as paired vectors into the dfba_bivariate_concordance()
function.
As with the dfba_bivariate_concordance()
function, the Bayesian analysis focuses on the
population concordance proportion phi (\phi)
; and G=2\phi-1
. The
likelihood function is proportional to \phi^{n_c}(1-\phi)^{n_d}
. The
prior distribution is a beta function, and the posterior distribution is the
conjugate beta where a = a0 + nc
and
b = b0 + nd
.
Value
A list containing the following components:
gamma |
Sample Goodman-Kruskal gamma statistic; equivalent to the sample rank correlation coefficient tau_A |
a0 |
First shape parameter for prior beta |
b0 |
Second shape parameter for prior beta |
sample_p |
Sample estimate for proportion concordance |
nc |
Number of concordant comparisons between the paired measures |
nd |
Number of discordant comparisons between the paired measures |
a_post |
First shape parameter for the posterior beta distribution for the phi parameter |
b_post |
Second shape parameter for the posterior beta distribution for the phi parameter |
post_median |
Median of the posterior distribution for the phi concordance parameter |
prob_interval |
The probability of the interval estimate for the phi parameter |
eti_lower |
Lower limit of the posterior equal-tail interval for the phi parameter where the width of the interval is specified by the |
eti_upper |
Upper limit of the posterior equal-tail interval for the phi parameter where the width of the interval is specified by the |
References
Chechile, R.A. (2020). Bayesian Statistics for Experimental Scientists: A General Introduction Using Distribution-Free Methods. Cambridge: MIT Press.
Siegel, S., & Castellan, N. J. (1988) Nonparametric Statistics for the Behavioral Sciences. New York: McGraw Hill.
See Also
dfba_bivariate_concordance
for a more extensive discussion about the \tau_A
statistic and the flawed \tau_B
correlation
Examples
# Example with matrix input
N <- matrix(c(38, 4, 5, 0, 6, 40, 1, 2, 4, 8, 20, 30),
ncol = 4,
byrow = TRUE)
colnames(N) <- c('C1', 'C2', 'C3', 'C4')
rownames(N) <- c('R1', 'R2', 'R3')
dfba_gamma(N)
# Sample problem with table input
NTable <- as.table(N)
dfba_gamma(NTable)