OSCA_par {modACDC} | R Documentation |
OSCA_par
Description
OSCA_par determines the percent variance explained in an external variable (exposures or responses) for a range of increasingly reduced datasets. Dimension reduction is done with Partition, where features are only condensed into modules if the intraclass correlation between the features is at least the user-supplied information loss criterion (ILC), 0 <= ILC <= 1. An ILC of one returns the full dataset with no reduction, and an ILC of zero returns one module of all input features, reducing the dataset to the mean value. For each ILC value, with the number of ILCs tested determined by input parameter ILCincrement, the function returns the point estimate and standard error of the percent variance explained in the observed external variable by the reduced dataset. If input parameter permute is true, the function also returns the same values for a random permutation of the external variable.
Usage
OSCA_par(
df,
externalVar,
ILCincrement = 0.05,
oscaPath,
remlAlg = 0,
maxRemlIt = 100,
numCovars = NULL,
catCovars = NULL,
permute = TRUE,
numNodes = 1,
verbose = TRUE
)
Arguments
df |
n x p data frame or matrix of numeric -omics values with no ID column |
externalVar |
vector of length n of external variable values with no ID column |
ILCincrement |
float between zero and one determining interval between tested ILC values; default is 0.05 |
oscaPath |
absolute path to OSCA software |
remlAlg |
which algorithm to run REML iterations in GCTA; 0 = average information (AI), 1 = Fisher-scoring, 2 = EM; default is 0 (AI) |
maxRemlIt |
the maximum number of REML iterations; default is 100 |
numCovars |
n x c_n matrix of numerical covariates to adjust heritability model for; must be in same person order as externalVar; default is NULL |
catCovars |
n x c_c matrix of categorical covariates to adjust heritability model for; must be in same person order as externalVar; default is NULL |
permute |
boolean value for whether or not to calculate values for a random permutation of the external variable; default is true |
numNodes |
number of available compute nodes for parallelization; default is 1 |
verbose |
logical for whether or not to display progress updates; default is TRUE |
Details
OmicS-data-based Complex trait Analysis (OSCA) is a suite of C++ functions. In order to use the OSCA functions, the user must specify the absolute path to the OSCA software, which can be downloaded from the Yang Lab website here.
Here, we use OSCA's Omics Restricted Maximum Likelihood (OREML) method to estimate the percent of variance in an external phenotype that can be explained by an omics profile, akin to heritability estimates in GWAS. OREML is called twice for each ILC tested if permutations are included.
Dimension reduction is done with Partition, an agglomerative data reduction method which performs both feature condensation and extraction based on a user provided information loss criterion (ILC). Feature condensation into modules are only accepted if the intraclass correlation between the features is at least the ILC. The superPartition function is called if the gene expression dataset contains more than 4,000 features.
Value
Data frame with columns
- ILC
the information loss criterion used for that iteration
- InformationLost
percent information lost due to data reduction
- PercentReduction
percent of variables condensed compared to unreduced data
- VarianceExplained_Observed
percent variance explained in observed external variable by the data
- SE_Observed
standard error of the percent variance estimate for observed external variable
- VarianceExplained_Permuted
percent variance explained in permuted external variable by the data; only if input parameter "permute" is true
- SE_Permuted
standard error of the percent variance estimate for permuted external variable; only if input parameter "permute" is true
Author(s)
Katelyn Queen, kjqueen@usc.edu
References
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 57 (1995) 289–300.
Martin P, et al. Novel aspects of PPARalpha-mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study. Hepatology, in press, 2007.
Millstein J, Battaglin F, Barrett M, Cao S, Zhang W, Stintzing S, et al. Partition: a surjective mapping approach for dimensionality reduction. Bioinformatics 36 (2019) 676–681. doi:10.1093/bioinformatics/ btz661.
Queen K, Nguyen MN, Gilliland F, Chun S, Raby BA, Millstein J. ACDC: a general approach for detecting phenotype or exposure associated co-expression. Frontiers in Medicine (2023) 10. doi:10.3389/fmed.2023.1118824.
See Also
OSCA software - https://yanglab.westlake.edu.cn/software/osca/
Examples
#load CCA package for example dataset
library(CCA)
# load dataset
data("nutrimouse")
# run function; input absolute path to OSCA software before running
## Not run: OSCA_par(df = nutrimouse$gene,
externalVar = as.numeric(nutrimouse$diet),
ILCincrement = 0.25,
oscaPath = "pathHere")
## End(Not run)