GCTA_par {modACDC}R Documentation

GCTA_par

Description

GCTA_par determines the average heritability of the first principal component of either the co-expression or covariance of gene expression modules for a range of increasingly reduced datasets. Dimension reduction is done with Partition, where features are only condensed into modules if the intraclass correlation between the features is at least the user-supplied information loss criterion (ILC), 0 <= ILC <= 1. An ILC of one returns the full dataset with no reduction, and an ILC of zero returns one module of all input features, reducing the dataset to the mean value. For each ILC value, with the number of ILCs tested determined by input parameter ILCincrement, the function returns the point estimate and standard error of the average heritability of the first principal component of the co-expression or covariance of the gene expression modules in the reduced dataset. If input parameter permute is true, the function also returns the same values for a random permutation of the first principle component of the appropriate matrix.

Usage

GCTA_par(
  df,
  ILCincrement = 0.05,
  fileLoc,
  gctaPath,
  remlAlg = 0,
  maxRemlIt = 100,
  numCovars = NULL,
  catCovars = NULL,
  summaryType,
  permute = TRUE,
  numNodes = 1,
  verbose = TRUE
)

Arguments

df

n x p data frame or matrix of numeric -omics values with no ID column

ILCincrement

float between zero and one determining interval between tested ILC values; default is 0.05

fileLoc

absolute file path to bed, bim, and fam files, including prefix

gctaPath

absolute path to GCTA software

remlAlg

algorithm to run REML iterations in GCTA; 0 = average information (AI), 1 = Fisher-scoring, 2 = EM; default is 0 (AI)

maxRemlIt

the maximum number of REML iterations; default is 100

numCovars

n x c_n matrix of numerical covariates to adjust heritability model for; must be in same person order as fam file; default is NULL

catCovars

n x c_c matrix of categorical covariates to adjust heritability model for; must be in same person order as fam file; default is NULL

summaryType

one of "coexpression" or "covariance"; determines how to summarize each module

permute

boolean value for whether or not to calculate values for a random permutation module summary; default is true

numNodes

number of available compute nodes for parallelization; default is 1

verbose

logical for whether or not to display progress updates; default is TRUE

Details

Genome-wide Complex Trait Analysis (GCTA) is a suite of C++ functions. In order to use the GCTA functions, the user must specify the absolute path to the GCTA software, which can be downloaded from the Yang Lab website here.

Here, we use GCTA's Genomics REstricted Maximum Likelihood (GREML) method to estimate the heritability of an external phenotype. GREML is called 2*number of modules for each ILC tested if permutations are included.

Dimension reduction is done with Partition, an agglomerative data reduction method which performs both feature condensation and extraction based on a user provided information loss criterion (ILC). Feature condensation into modules are only accepted if the intraclass correlation between the features is at least the ILC. The superPartition function is called if the gene expression dataset contains more than 4,000 features.

Value

Data frame with columns

ILC

the information loss criterion used for that iteration

InformationLost

percent information lost due to data reduction

PercentReduction

percent of variables condensed compared to unreduced data

AveVarianceExplained_Observed

average heritability estimate for PC1 of observed summary data

OverallSD_Observed

standard deviation of the heritability estimates for PC1 of observed summary data

VarianceExplained_Observed

list of heritability estimates for PC1 of observed summary for all modules

SE_Observed

list of standard errors of the heritability estimates for PC1 of observed summary data for all modules

AveVarianceExplained_Permuted

average heritability for PC1 of permuted summary data

OverallSD_Permuted

standard deviation of the heritability estimates for PC1 of permuted summary data

VarianceExplained_Permuted

list of heritability estimates for PC1 of permuted summary data for all modules

SE_Permuted

list of standard errors of the heritability estimates for PC1 of permuted summary data for all modules

Author(s)

Katelyn Queen, kjqueen@usc.edu

References

Millstein J, Battaglin F, Barrett M, Cao S, Zhang W, Stintzing S, et al. Partition: a surjective mapping approach for dimensionality reduction. Bioinformatics 36 (2019) 676–681. doi:10.1093/bioinformatics/ btz661.

Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011 Jan 7;88(1):76-82. doi: 10.1016/j.ajhg.2010.11.011. Epub 2010 Dec 17. PMID: 21167468; PMCID: PMC3014363.

See Also

GCTA software - https://yanglab.westlake.edu.cn/software/gcta/

Examples


# run function; input absolute path to OSCA software before running
## Not run: GCTA_par(df = geneExpressionData, 
          ILCincrement = 0.25, 
          fileLoc = "pathHere",
          gctaPath = "pathHere",
          summaryType = "coexpression",
          permute = TRUE,
          numNodes = 1)
## End(Not run)


[Package modACDC version 2.0.1 Index]