R: GCTA

GCTA_par {modACDC}

R Documentation

GCTA_par

Description

GCTA_par determines the average heritability of the first principal component of either the co-expression or covariance of gene expression modules for a range of increasingly reduced datasets. Dimension reduction is done with Partition, where features are only condensed into modules if the intraclass correlation between the features is at least the user-supplied information loss criterion (ILC), 0 <= ILC <= 1. An ILC of one returns the full dataset with no reduction, and an ILC of zero returns one module of all input features, reducing the dataset to the mean value. For each ILC value, with the number of ILCs tested determined by input parameter ILCincrement, the function returns the point estimate and standard error of the average heritability of the first principal component of the co-expression or covariance of the gene expression modules in the reduced dataset. If input parameter permute is true, the function also returns the same values for a random permutation of the first principle component of the appropriate matrix.

Usage

GCTA_par(
  df,
  ILCincrement = 0.05,
  fileLoc,
  gctaPath,
  remlAlg = 0,
  maxRemlIt = 100,
  numCovars = NULL,
  catCovars = NULL,
  summaryType,
  permute = TRUE,
  numNodes = 1,
  verbose = TRUE
)

Arguments

`df`	n x p data frame or matrix of numeric -omics values with no ID column
`ILCincrement`	float between zero and one determining interval between tested ILC values; default is 0.05
`fileLoc`	absolute file path to bed, bim, and fam files, including prefix
`gctaPath`	absolute path to GCTA software
`remlAlg`	algorithm to run REML iterations in GCTA; 0 = average information (AI), 1 = Fisher-scoring, 2 = EM; default is 0 (AI)
`maxRemlIt`	the maximum number of REML iterations; default is 100
`numCovars`	n x c_n matrix of numerical covariates to adjust heritability model for; must be in same person order as fam file; default is NULL
`catCovars`	n x c_c matrix of categorical covariates to adjust heritability model for; must be in same person order as fam file; default is NULL
`summaryType`	one of "coexpression" or "covariance"; determines how to summarize each module
`permute`	boolean value for whether or not to calculate values for a random permutation module summary; default is true
`numNodes`	number of available compute nodes for parallelization; default is 1
`verbose`	logical for whether or not to display progress updates; default is TRUE

Details

Genome-wide Complex Trait Analysis (GCTA) is a suite of C++ functions. In order to use the GCTA functions, the user must specify the absolute path to the GCTA software, which can be downloaded from the Yang Lab website here.

Here, we use GCTA's Genomics REstricted Maximum Likelihood (GREML) method to estimate the heritability of an external phenotype. GREML is called 2*number of modules for each ILC tested if permutations are included.

Dimension reduction is done with Partition, an agglomerative data reduction method which performs both feature condensation and extraction based on a user provided information loss criterion (ILC). Feature condensation into modules are only accepted if the intraclass correlation between the features is at least the ILC. The superPartition function is called if the gene expression dataset contains more than 4,000 features.

Value

Data frame with columns

ILC: the information loss criterion used for that iteration
InformationLost: percent information lost due to data reduction
PercentReduction: percent of variables condensed compared to unreduced data
AveVarianceExplained_Observed: average heritability estimate for PC1 of observed summary data
OverallSD_Observed: standard deviation of the heritability estimates for PC1 of observed summary data
VarianceExplained_Observed: list of heritability estimates for PC1 of observed summary for all modules
SE_Observed: list of standard errors of the heritability estimates for PC1 of observed summary data for all modules
AveVarianceExplained_Permuted: average heritability for PC1 of permuted summary data
OverallSD_Permuted: standard deviation of the heritability estimates for PC1 of permuted summary data
VarianceExplained_Permuted: list of heritability estimates for PC1 of permuted summary data for all modules
SE_Permuted: list of standard errors of the heritability estimates for PC1 of permuted summary data for all modules

Author(s)

Katelyn Queen, kjqueen@usc.edu

References

Millstein J, Battaglin F, Barrett M, Cao S, Zhang W, Stintzing S, et al. Partition: a surjective mapping approach for dimensionality reduction. Bioinformatics 36 (2019) 676–681. doi:10.1093/bioinformatics/ btz661.

Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011 Jan 7;88(1):76-82. doi: 10.1016/j.ajhg.2010.11.011. Epub 2010 Dec 17. PMID: 21167468; PMCID: PMC3014363.

Examples


# run function; input absolute path to OSCA software before running
## Not run: GCTA_par(df = geneExpressionData, 
          ILCincrement = 0.25, 
          fileLoc = "pathHere",
          gctaPath = "pathHere",
          summaryType = "coexpression",
          permute = TRUE,
          numNodes = 1)
## End(Not run)

[Package modACDC version 2.0.1 Index]