ACDCmod {modACDC} | R Documentation |
ACDCmod
Description
ACDCmod detects differential co-expression between a set of genes, such as a module of co-expressed genes, and a set of external features (exposures or responses) by using canonical correlation analysis (CCA) on the external features and module co-expression values. Modules are provided by the user.
Usage
ACDCmod(
fullData,
modules,
externalVar,
identifierList = colnames(fullData),
numNodes = 1
)
Arguments
fullData |
data frame or matrix with samples as rows, all probes as columns; each entry should be numeric gene expression or other molecular data values |
modules |
vector of lists where each list contains indices of column locations in fullData that specify features in each module |
externalVar |
data frame, matrix, or vector containing external variable data to be used for CCA, rows are samples; all elements must be numeric |
identifierList |
optional row vector of identifiers, of the same length and order, corresponding to columns in fullData (ex: HUGO symbols for genes); default value is the column names from fullData |
numNodes |
number of available compute nodes for parallelization; default is 1 |
Details
For more information about how the co-expression features are calculated, see the coVar documentation.
Following CCA, which determines linear combinations of the co-expression and external feature vectors that maximize the cross-covariance matrix for each module, a Wilks-Lambda test is performed to determine if the correlation between these linear combinations is significant. If they are significant, that implies there is differential co-expression. If there is only one co-expression value for a module (ie two features in the module) and a single external variable, CCA reduces to a simple correlation test, and the t-distribution is used to test for significant correlation (Widmann, 2005). If the number of co-expression features in a particular module is larger than the number of samples, CCA will return correlation coefficients of 1, and p-values and BH FDR q-values will not be calculated. See ACDChighdim for our solution.
Value
Tibble, sorted by ascending BH FDR value, with columns
- moduleNum
module identifier
- colNames
list of column names from fullData of the features in the module
- features
list of identifiers from input parameter "identifierList" for all features in the module
- CCA_corr
list of CCA canonical correlation coefficients
- CCA_pval
Wilks-Lamda F-test p-value; t-test p-value if there are only 2 features in the module and a single external variable
- BHFDR_qval
Benjamini-Hochberg false discovery rate q-value
Author(s)
Katelyn Queen, kjqueen@usc.edu
References
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 57 (1995) 289–300.
Martin P, et al. Novel aspects of PPARalpha-mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study. Hepatology, in press, 2007.
Millstein J, Battaglin F, Barrett M, Cao S, Zhang W, Stintzing S, et al. Partition: a surjective mapping approach for dimensionality reduction. Bioinformatics 36 (2019) 676–681. doi:10.1093/bioinformatics/ btz661.
Queen K, Nguyen MN, Gilliland F, Chun S, Raby BA, Millstein J. ACDC: a general approach for detecting phenotype or exposure associated co-expression. Frontiers in Medicine (2023) 10. doi:10.3389/fmed.2023.1118824.
Widmann M. One-Dimensional CCA and SVD, and Their Relationship to Regression Maps. Journal of Climate 18 (2005) 2785–2792. doi:10.1175/jcli3424.1.
Examples
#load CCA package for example dataset
library(CCA)
# load dataset
data("nutrimouse")
# partition dataset and save modules
library(partition)
part <- partition(nutrimouse$lipid, threshold = 0.50)
mods <- part$mapping_key[which(grepl("reduced_var_", part$mapping_key$variable)), ]$mapping
# run function for diet and genotype
ACDCmod(fullData = nutrimouse$lipid,
modules = mods,
externalVar = data.frame(diet=as.numeric(nutrimouse$diet),
genotype=as.numeric(nutrimouse$genotype)))