FunChisq-package {FunChisq}R Documentation

Model-Free Functional Chi-Squared and Exact Tests

Description

Statistical hypothesis testing methods for model-free functional dependency using asymptotic chi-squared or exact distributions. Functional chi-squared test statistics (Zhang and Song 2013; Zhang 2014; Nguyen 2018; Zhong 2019; Zhong and Song 2019a; Nguyen et al. 2020) are asymmetric, functionally optimal, and model-free, unique from other related statistical measures.

Tests in this package reveal evidence for causality based on the causality-by-functionality principle (Simon and Rescher 1966). The tests require data from two or more variables be formatted as a contingency table. Continuous variables need to be discretized first, for example, using R packages Ckmeans.1d.dp or GridOnClusters.

The package implements an asymptotic functional chi-squared test (Zhang and Song 2013; Zhang 2014), an adapted functional chi-squared test (Kumar2022AFT), and an exact functional test (Nguyen 2018; Zhong 2019; Zhong and Song 2019a; Nguyen et al. 2020). The normalized functional chi-squared test was used by Best Performer NMSUSongLab in HPN-DREAM (DREAM8) Breast Cancer Network Inference Challenges (Hill et al. 2016).

A function index derived from the functional chi-squared offers a new effect size measure for the strength of function dependency. It is asymmetrically functionally optimal, different from the symmetric Cramer's V, also a better alternative to conditional entropy in many aspects.

A simulator is provided to generate functional, dependent non-functional, and independent patterns (Sharma et al. 2017).

For continuous data, these tests offer an advantage over regression analysis when a parametric form cannot be reliably assumed for the underlying function. For categorical data, they provide a novel means to assess directional dependency not possible with symmetrical Pearson's chi-squared test, G-test, or Fisher's exact test.

Details

Package: FunChisq
Type: Package
Current version: 2.5.3
Initial release version: 1.0
Initial release date: 2014-03-08
License: LGPL (>= 3)

Author(s)

Yang Zhang, Hua Zhong, Hien Nguyen, Ruby Sharma, Sajal Kumar, Yiyi Li, and Joe Song

References

Hill SM, Heiser LM, Cokelaer T, Unger M, Nesser NK, Carlin DE, Zhang Y, Sokolov A, Paull EO, Wong CK, Graim K, Bivol A, Wang H, Zhu F, Afsari B, Danilova LV, Favorov AV, Lee WS, Taylor D, Hu CW, Long BL, Noren DP, Bisberg AJ, The HPN-DREAM Consortium, Mills GB, Gray JW, Kellen M, Norman T, Friend S, Qutub AA, Fertig EJ, Guan Y, Song M, Stuart JM, Spellman PT, Koeppl H, Stolovitzky G, Saez-Rodriguez J, Mukherjee S (2016). “Inferring causal molecular networks: empirical assessment through a community-based effort.” Nat Methods, 13, 310–318. doi:10.1038/nmeth.3773.

Nguyen HH (2018). Inference of Functional Dependency via Asymmetric, Optimal, and Model-free Statistics. Ph.D. thesis, Department of Computer Science, New Mexico State University, Las Cruces, NM, USA.

Nguyen HH, Zhong H, Song M (2020). “Optimality, accuracy, and efficiency of an exact functional test.” In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, 2683–2689. doi:10.24963/ijcai.2020/372.

Sharma R, Kumar S, Zhong H, Song M (2017). “Simulating noisy, nonparametric, and multivariate discrete patterns.” The R Journal, 9(2), 366–377. doi:10.32614/RJ-2017-053.

Simon HA, Rescher N (1966). “Cause and counterfactual.” Philosophy of Science, 33(4), 323–340.

Zhang Y (2014). Nonparametric Statistical Methods for Biological Network Inference. Ph.D. thesis, Department of Computer Science, New Mexico State University, Las Cruces, NM, USA.

Zhang Y, Song M (2013). “Deciphering interactions in causal networks without parametric assumptions.” arXiv Molecular Networks, arXiv:1311.2707. https://arxiv.org/abs/1311.2707.

Zhong H (2019). Model-free Gene-to-zone Network Inference of Molecular Mechanisms in Biology. Ph.D. thesis, Department of Computer Science, New Mexico State University, Las Cruces, NM, USA.

Zhong H, Song M (2019a). “A fast exact functional test for directional association and cancer biology applications.” IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16(3), 818–826. doi:10.1109/TCBB.2018.2809743.

See Also

For data discretization, an option is optimal univariate clustering via package Ckmeans.1d.dp. A second option is joint multivariate discretization via package GridOnClusters.

For symmetric dependency tests on discrete data, see Pearson's chi-squared test (chisq.test), Fisher's exact test (fisher.test), mutual information (package entropy), and G-test, implemented in packages DescTools and RVAideMemoire.


[Package FunChisq version 2.5.4 Index]