fun.chisq.test {FunChisq}R Documentation

Model-Free Functional Chi-Squared and Exact Tests

Description

Asymptotic chi-squared, normalized chi-squared or exact tests on contingency tables to determine model-free functional dependency of the column variable on the row variable.

Usage

fun.chisq.test(
  x,
  method = c("fchisq", "nfchisq", "adapted",
             "exact", "exact.qp", "exact.dp", "exact.dqp",
             "default", "normalized", "simulate.p.value"),
  alternative = c("non-constant", "all"), log.p=FALSE,
  index.kind = c("conditional", "unconditional"),
  simulate.nruns = 2000,
  exact.mode.bound=TRUE
)

Arguments

x

a matrix representing a contingency table. The row variable represents the independent variable or all unique combinations of multiple independent variables. The column variable is the dependent variable.

method

a character string to specify the method to compute the functional chi-squared test statistic and its p-value. The options are "fchisq" (equivalent to "default", the default), "nfchisq" (equivalent to "normalized"), "exact", "adapted", "exact.qp", "exact.dp", "exact.dqp" or "simulate.p.value". See Details.

Note: "default" and "normalized" are deprecated.

alternative

a character string to specify the alternative hypothesis. The options are "non-constant" (default, non-constant functions) and "all" (all types of functions including constant ones).

log.p

logical; if TRUE, the p-value is given as log(p). Taking the log improves the accuracy when p-value is close to zero. The default is FALSE.

index.kind

a character string to specify the kind of function index xi.f to be estimated. The options are "conditional" (default) and "unconditional". See Details.

simulate.nruns

A number to specify the number of tables generated to simulate the null distribution. Default is 2000. Only used when method="simulate.p.value".

exact.mode.bound

logical; if TRUE, a fast branch-and-bound algorithm is used for the exact functional test (method="exact"). If FALSE, a slow brute-force enumeration method is used to provide a reference for runtime analysis. Both options provide the same exact p-value. The default is TRUE.

Details

The functional chi-squared test determines whether the column variable is a function of the row variable in contingency table x (Zhang and Song 2013; Zhang 2014). This function supports three hypothesis testing methods:

When method="fchisq" (equivalent to "default", the default), the test statistic is computed as described in (Zhang and Song 2013; Zhang 2014) and the p-value is computed using the chi-squared distribution.

When method="nfchisq" (equivalent to "normalized"), the test statistic is obtained by shifting and scaling the original test statistic (Zhang and Song 2013; Zhang 2014); and the p-value is computed using the standard normal distribution (Box et al. 2005). The normalized chi-squared, more conservative on the degrees of freedom, was used by the Best Performer NMSUSongLab in HPN-DREAM (DREAM8) Breast Cancer Network Inference Challenges.

When method="exact", "exact.qp" (quadratic programming) (Zhong and Song 2019a; Zhong 2019), "exact.dp" (dynamic programming) (Nguyen 2018; Nguyen et al. 2020), or "exact.dqp" (dynamic and quadratic programming) (Nguyen 2018; Nguyen et al. 2020), an exact functional test is performed. The option of "exact" uses "exact.dqp", the fastest method. All methods compute an exact p-value.

When method="adapted", the adapted functional chi-squared test (Kumar and Song 2022) is used. The test statistic is obtained by evaluating the most populous portrait or square (number of rows <= number of columns) table in the contingency table x. The p-value is computed using the chi-squared distribution. This option should be used to determine the functional direction between variables in x.

For the "exact.qp" and "exact.dp" options, if the sample size is no more than 200 or the average cell count is less than five, and the table size is no more than 10 in either row or column, the exact test will not be called and the asymptotic functional chi-squared test (method="fchisq") is used instead.

For "exact.dqp", the exact functional test will always be performed.

For 2-by-2 contingency tables, the asymptotic test options (method="fchisq" or "nfchisq") are recommended to test functional dependency, instead of the exact functional test.

When method="simulate.p.value", a simulated null distribution is used to calculate p-value. The null distribution is a multinomial distribution that is the product of two marginal distributions. Like other Monte Carlo based methods, this method is slower but may be more accurate than other methods based on asymptotic distributions.

index.kind specifies the kind of function index to be computed. If the experimental design controls neither the row nor column marginal sums, index.kind = "unconditional" is recommended; If the column marginal sums are controlled, index.kind = "conditional" is recommended. The conditional function index is the square root of Goodman-Kruskal's tau (Goodman and Kruskal 1954). The choice of index.kind affects only the function index xi.f value, but not the test statistic or p-value.

Value

A list with class "htest" containing the following components:

statistic

the functional chi-squared statistic if method = "fchisq", "default", or "exact"; or the normalized functional chi-squared statistic if method = "nfchisq" or "normalized".

parameter

degrees of freedom for the functional chi-squared statistic.

p.value

p-value of the functional test. If method = "fchisq" (or "default"), it is computed by an asymptotic chi-squared distribution; if method = "nfchisq" (or "normalized"), it is computed by the standard normal distribution; if method = "exact", it is computed by an exact hypergeometric distribution.

estimate

an estimate of function index between 0 and 1. The value of 1 indicates a strictly mathematical function. It is asymmetrical with respect to transpose of the input contingency table, different from the symmetrical Cramer's V based on the Pearson's chi-squared test statistic. See (Zhong and Song 2019b; Kumar et al. 2018) for the definition of function index.

Author(s)

Yang Zhang, Hua Zhong, Hien Nguyen, Sajal Kumar, and Joe Song

References

Box GE, Hunter JS, Hunter WG (2005). Statistics for Experimenters: Design, Innovation and Discovery, 2nd edition. Wiley-Interscience, New York.

Goodman LA, Kruskal WH (1954). “Measures of Association for Cross Classifications.” Journal of the American Statistical Association, 49(268), 732–764.

Kumar S, Song M (2022). “Overcoming biases in causal inference of molecular interactions.” Bioinformatics, 38(10), 2818–2825. doi:10.1093/bioinformatics/btac206.

Kumar S, Zhong H, Sharma R, Li Y, Song M (2018). “Scrutinizing functional interaction networks from RNA-binding proteins to their targets in cancer.” In IEEE International Conference on Bioinformatics and Biomedicine, 185–190. doi:10.1109/BIBM.2018.8621502.

Nguyen HH (2018). Inference of Functional Dependency via Asymmetric, Optimal, and Model-free Statistics. Ph.D. thesis, Department of Computer Science, New Mexico State University, Las Cruces, NM, USA.

Nguyen HH, Zhong H, Song M (2020). “Optimality, accuracy, and efficiency of an exact functional test.” In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, 2683–2689. doi:10.24963/ijcai.2020/372.

Zhang Y (2014). Nonparametric Statistical Methods for Biological Network Inference. Ph.D. thesis, Department of Computer Science, New Mexico State University, Las Cruces, NM, USA.

Zhang Y, Song M (2013). “Deciphering interactions in causal networks without parametric assumptions.” arXiv Molecular Networks, arXiv:1311.2707. https://arxiv.org/abs/1311.2707.

Zhong H (2019). Model-free Gene-to-zone Network Inference of Molecular Mechanisms in Biology. Ph.D. thesis, Department of Computer Science, New Mexico State University, Las Cruces, NM, USA.

Zhong H, Song M (2019a). “A fast exact functional test for directional association and cancer biology applications.” IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16(3), 818–826. doi:10.1109/TCBB.2018.2809743.

Zhong H, Song M (2019b). “Directional association test reveals high-quality putative cancer driver biomarkers including noncoding RNAs.” BMC Med Genomics, 12(7), 129. doi:10.1186/s12920-019-0565-9.

See Also

For data discretization, an option is optimal univariate clustering via package Ckmeans.1d.dp. A second option is joint multivariate discretization via package GridOnClusters.

For symmetrical dependency tests on discrete data, see Pearson's chi-squared test chisq.test, Fisher's exact test fisher.test, and mutual information methods in package entropy.

Examples


# Example 1. Asymptotic functional chi-squared test
x <- matrix(c(20,0,20,0,20,0,5,0,5), 3)
fun.chisq.test(x) # strong functional dependency
fun.chisq.test(t(x)) # weak functional dependency

# Example 2. Normalized functional chi-squared test
x <- matrix(c(8,0,8,0,8,0,2,0,2), 3)
fun.chisq.test(x, method="nfchisq") # strong functional dependency
fun.chisq.test(t(x), method="nfchisq") # weak functional dependency

# Example 3. Exact functional chi-squared test
x <- matrix(c(4,0,4,0,4,0,1,0,1), 3)
fun.chisq.test(x, method="exact") # strong functional dependency
fun.chisq.test(t(x), method="exact") # weak functional dependency

# Example 4. Exact functional chi-squared test on a real data set
#            (Shen et al., 2002)
# x is a contingency table with row variable for p53 mutation and
#   column variable for CIMP
x <- matrix(c(12,26,18,0,8,12), nrow=2, ncol=3, byrow=TRUE)

# Example 5. Adpated functional chi-squared test
x <- matrix(c(20, 0, 1, 0, 1, 20, 3, 2, 15, 2, 5, 2), 3, 4, byrow=TRUE)
fun.chisq.test(x, method="adapted") # strong functional dependency
fun.chisq.test(t(x), method="adapted") # weak functional dependency

# Test the functional dependency: p53 mutation -> CIMP
fun.chisq.test(x, method="exact")

# Test the functional dependency CIMP -> p53 mutation
fun.chisq.test(t(x), method="exact")

# Example 6. Asymptotic functional chi-squared test with simulated distribution
x <- matrix(c(20,0,20,0,20,0,5,0,5), 3)
fun.chisq.test(x, method="simulate.p.value")
fun.chisq.test(x, method="simulate.p.value", simulate.n = 1000)


[Package FunChisq version 2.5.4 Index]