R: ACDC

ACDC {modACDC}

R Documentation

ACDC

Description

ACDC detects differential co-expression between a set of genes, such as a module of co-expressed genes, and a set of external features (exposures or responses) by using canonical correlation analysis (CCA) on the external features and module co-expression values. Modules are detected via Partition.

Usage

ACDC(
  fullData,
  ILC = 0.5,
  externalVar,
  identifierList = colnames(fullData),
  numNodes = 1
)

Arguments

`fullData`	data frame or matrix with samples as rows, all features as columns; each entry should be numeric gene expression or other molecular data values
`ILC`	information loss criterion for Partition, or the minimum intra-class correlation required for features to be condensed; 0 `\leq` ILC `\leq` 1; default is 0.50
`externalVar`	data frame, matrix, or vector containing external variable data to be used for CCA, rows are samples; all elements must be numeric
`identifierList`	optional row vector of identifiers, of the same length and order, corresponding to columns in fullData (ex: HUGO symbols for genes); default value is the column names from fullData
`numNodes`	number of available compute nodes for parallelization; default is 1

Details

Modules are identified by Partition, an agglomerative data reduction method which performs both feature condensation and extraction based on a user provided information loss criterion (ILC). Feature condensation into modules are only accepted if the intraclass correlation between the features is at least the ILC. For more information about how the co-expression features are calculated, see the coVar documentation.

Following CCA, which determines linear combinations of the co-expression and external feature vectors that maximize the cross-covariance matrix for each module, a Wilks-Lambda test is performed to determine if the correlation between these linear combinations is significant. If they are significant, that implies there is differential co-expression. If there is only one co-expression value for a module (ie two features in the module) and a single external variable, CCA reduces to a simple correlation test, and the t-distribution is used to test for significant correlation (Widmann, 2005). If the number of co-expression features in a particular module is larger than the number of samples, CCA will return correlation coefficients of 1, and p-values and BH FDR q-values will not be calculated. See ACDChighdim for our solution.

Value

Tibble, sorted by ascending BH FDR value, with columns

moduleNum: module identifier
colNames: list of column names from fullData of the features in the module
features: list of identifiers from input parameter "identifierList" for all features in the module
CCA_corr: list of CCA canonical correlation coefficients
CCA_pval: Wilks-Lamda F-test p-value or t-test p-value
BHFDR_qval: Benjamini-Hochberg false discovery rate q-value

Author(s)

Katelyn Queen, kjqueen@usc.edu

References

Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 57 (1995) 289–300.

Martin P, et al. Novel aspects of PPARalpha-mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study. Hepatology, in press, 2007.

Millstein J, Battaglin F, Barrett M, Cao S, Zhang W, Stintzing S, et al. Partition: a surjective mapping approach for dimensionality reduction. Bioinformatics 36 (2019) 676–681. doi:10.1093/bioinformatics/ btz661.

Queen K, Nguyen MN, Gilliland F, Chun S, Raby BA, Millstein J. ACDC: a general approach for detecting phenotype or exposure associated co-expression. Frontiers in Medicine (2023) 10. doi:10.3389/fmed.2023.1118824..

Widmann M. One-Dimensional CCA and SVD, and Their Relationship to Regression Maps. Journal of Climate 18 (2005) 2785–2792. doi:10.1175/jcli3424.1.

Examples

#load CCA package for example dataset
library(CCA)

# load dataset
data("nutrimouse")

# run function for diet and genotype
ACDC(fullData = nutrimouse$lipid,
     ILC = 0.50, 
     externalVar = data.frame(diet=as.numeric(nutrimouse$diet), 
                              genotype=as.numeric(nutrimouse$genotype)))

[Package modACDC version 2.0.1 Index]