OSCA_par {modACDC}R Documentation

OSCA_par

Description

OSCA_par determines the percent variance explained in an external variable (exposures or responses) for a range of increasingly reduced datasets. Dimension reduction is done with Partition, where features are only condensed into modules if the intraclass correlation between the features is at least the user-supplied information loss criterion (ILC), 0 <= ILC <= 1. An ILC of one returns the full dataset with no reduction, and an ILC of zero returns one module of all input features, reducing the dataset to the mean value. For each ILC value, with the number of ILCs tested determined by input parameter ILCincrement, the function returns the point estimate and standard error of the percent variance explained in the observed external variable by the reduced dataset. If input parameter permute is true, the function also returns the same values for a random permutation of the external variable.

Usage

OSCA_par(
  df,
  externalVar,
  ILCincrement = 0.05,
  oscaPath,
  remlAlg = 0,
  maxRemlIt = 100,
  numCovars = NULL,
  catCovars = NULL,
  permute = TRUE,
  numNodes = 1,
  verbose = TRUE
)

Arguments

df

n x p data frame or matrix of numeric -omics values with no ID column

externalVar

vector of length n of external variable values with no ID column

ILCincrement

float between zero and one determining interval between tested ILC values; default is 0.05

oscaPath

absolute path to OSCA software

remlAlg

which algorithm to run REML iterations in GCTA; 0 = average information (AI), 1 = Fisher-scoring, 2 = EM; default is 0 (AI)

maxRemlIt

the maximum number of REML iterations; default is 100

numCovars

n x c_n matrix of numerical covariates to adjust heritability model for; must be in same person order as externalVar; default is NULL

catCovars

n x c_c matrix of categorical covariates to adjust heritability model for; must be in same person order as externalVar; default is NULL

permute

boolean value for whether or not to calculate values for a random permutation of the external variable; default is true

numNodes

number of available compute nodes for parallelization; default is 1

verbose

logical for whether or not to display progress updates; default is TRUE

Details

OmicS-data-based Complex trait Analysis (OSCA) is a suite of C++ functions. In order to use the OSCA functions, the user must specify the absolute path to the OSCA software, which can be downloaded from the Yang Lab website here.

Here, we use OSCA's Omics Restricted Maximum Likelihood (OREML) method to estimate the percent of variance in an external phenotype that can be explained by an omics profile, akin to heritability estimates in GWAS. OREML is called twice for each ILC tested if permutations are included.

Dimension reduction is done with Partition, an agglomerative data reduction method which performs both feature condensation and extraction based on a user provided information loss criterion (ILC). Feature condensation into modules are only accepted if the intraclass correlation between the features is at least the ILC. The superPartition function is called if the gene expression dataset contains more than 4,000 features.

Value

Data frame with columns

ILC

the information loss criterion used for that iteration

InformationLost

percent information lost due to data reduction

PercentReduction

percent of variables condensed compared to unreduced data

VarianceExplained_Observed

percent variance explained in observed external variable by the data

SE_Observed

standard error of the percent variance estimate for observed external variable

VarianceExplained_Permuted

percent variance explained in permuted external variable by the data; only if input parameter "permute" is true

SE_Permuted

standard error of the percent variance estimate for permuted external variable; only if input parameter "permute" is true

Author(s)

Katelyn Queen, kjqueen@usc.edu

References

Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 57 (1995) 289–300.

Martin P, et al. Novel aspects of PPARalpha-mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study. Hepatology, in press, 2007.

Millstein J, Battaglin F, Barrett M, Cao S, Zhang W, Stintzing S, et al. Partition: a surjective mapping approach for dimensionality reduction. Bioinformatics 36 (2019) 676–681. doi:10.1093/bioinformatics/ btz661.

Queen K, Nguyen MN, Gilliland F, Chun S, Raby BA, Millstein J. ACDC: a general approach for detecting phenotype or exposure associated co-expression. Frontiers in Medicine (2023) 10. doi:10.3389/fmed.2023.1118824.

See Also

OSCA software - https://yanglab.westlake.edu.cn/software/osca/

Examples

#load CCA package for example dataset
library(CCA)

# load dataset
data("nutrimouse")

# run function; input absolute path to OSCA software before running
## Not run: OSCA_par(df = nutrimouse$gene, 
          externalVar = as.numeric(nutrimouse$diet),
          ILCincrement = 0.25, 
          oscaPath = "pathHere")
## End(Not run)


[Package modACDC version 2.0.1 Index]