R: Goodman-Kruskal Gamma

dfba_gamma {DFBA}

R Documentation

Goodman-Kruskal Gamma

Description

Given bivariate data in the form of either a rank-ordered table or a matrix, returns the number of concordant and discordant changes between the variates, the Goodman-Kruskal gamma statistic, and a Bayesian analysis of the population concordance proportion parameter phi.

Usage

dfba_gamma(x, a0 = 1, b0 = 1, prob_interval = 0.95)

Arguments

`x`	Cross-tabulated matrix or table where cell [I, J] represents the frequency of observations where the rank of measure 1 is I and the rank of measure 2 is J.
`a0`	First shape parameter for the prior beta distribution (default is 1)
`b0`	Second shape parameter for the prior beta distribution (default is 1)
`prob_interval`	Desired width for interval estimates (default is 0.95)

Details

For bivariate data where two measures are restricted on an ordinal scale, such as when the two variates are ranked data over a limited set of integers, then an ordered contingency table is often a convenient data representation. For such a case the element in the [I, J] cell of the matrix is the frequency of occasions where one variate has a rank value of I and the corresponding rank for the other variate is J. This situation is a special case of the more general case where there are two continuous bivariate measures. For the special case of a rank-order matrix with frequencies, there is a distribution-free concordance correlation that is in common usage: Goodman and Kruskal's gamma G (Siegel & Castellan, 1988).

Chechile (2020) showed that Goodman and Kruskal's gamma is equivalent to the more general \tau_A nonparametric correlation coefficient. Historically, gamma was considered a different metric from \tau because typically the version of \tau in standard use was \tau_B, which is a flawed metric because it does not properly correct for ties. Note: cor(... ,method = "kendall") returns the \tau_B correlation, which is incorrect when there are ties. The correct \tau_A is computed by the dfba_bivariate_concordance() function.

The gamma statistic is equal to (n_c-n_d)/(n_c+n_d), where n_c is the number of occasions when the variates change in a concordant way and n_d is the number of occasions when the variates change in a discordant fashion. The value of n_c for an order matrix is the sum of terms for each [I, J] that are equal to n_{ij}N^{+}_{ij}, where n_{ij} is the frequency for cell [I, J] and N^{+}_{ij} is the sum of a frequencies in the matrix where the row value is greater than I and where the column value is greater than J. The value n_d is the sum of terms for each [I, J] that are n_{ij}N^{-}_{ij}, where N^{-}_{ij} is the sum of the frequencies in the matrix where row value is greater than I and the column value is less than J. The n_c and n_d values computed in this fashion are, respectively, equal to n_c and n_d values found when the bivariate measures are entered as paired vectors into the dfba_bivariate_concordance() function.

As with the dfba_bivariate_concordance() function, the Bayesian analysis focuses on the population concordance proportion phi (\phi); and G=2\phi-1. The likelihood function is proportional to \phi^{n_c}(1-\phi)^{n_d}. The prior distribution is a beta function, and the posterior distribution is the conjugate beta where a = a0 + nc and b = b0 + nd.

Value

A list containing the following components:

`gamma`	Sample Goodman-Kruskal gamma statistic; equivalent to the sample rank correlation coefficient tau_A
`a0`	First shape parameter for prior beta
`b0`	Second shape parameter for prior beta
`sample_p`	Sample estimate for proportion concordance `nc/(nc+nd)`
`nc`	Number of concordant comparisons between the paired measures
`nd`	Number of discordant comparisons between the paired measures
`a_post`	First shape parameter for the posterior beta distribution for the phi parameter
`b_post`	Second shape parameter for the posterior beta distribution for the phi parameter
`post_median`	Median of the posterior distribution for the phi concordance parameter
`prob_interval`	The probability of the interval estimate for the phi parameter
`eti_lower`	Lower limit of the posterior equal-tail interval for the phi parameter where the width of the interval is specified by the `prob_interval` input
`eti_upper`	Upper limit of the posterior equal-tail interval for the phi parameter where the width of the interval is specified by the `prob_interval` input

References

Chechile, R.A. (2020). Bayesian Statistics for Experimental Scientists: A General Introduction Using Distribution-Free Methods. Cambridge: MIT Press.

Siegel, S., & Castellan, N. J. (1988) Nonparametric Statistics for the Behavioral Sciences. New York: McGraw Hill.

Examples

# Example with matrix input
N <- matrix(c(38, 4, 5, 0, 6, 40, 1, 2, 4, 8, 20, 30),
            ncol = 4,
            byrow = TRUE)
colnames(N) <- c('C1', 'C2', 'C3', 'C4')
rownames(N) <- c('R1', 'R2', 'R3')
dfba_gamma(N)

# Sample problem with table input
NTable <- as.table(N)
dfba_gamma(NTable)

[Package DFBA version 0.1.0 Index]