R: A Generalized Joint-Location-Scale (gJLS) Test

gJLS2 {gJLS2}

R Documentation

A Generalized Joint-Location-Scale (gJLS) Test

Description

This function takes as input the genotype of a SNP (GENO), the SEX (SEX), and a quantitative trait (Y) in a sample population, and possibly additional covariates, such as principal components. The function returns the location and scale association p-values for each SNP, as well as the gJLS p-value, which provides the combined evidence via Fisher's method (Soave et al., 2015, 2017). To perform this analysis genome-wide, we recommend to use the R-plugin written for PLINK, see gJLSPLINK for more details.

Usage

gJLS2(
  GENO,
  Y,
  COVAR = NULL,
  SEX = NULL,
  Xchr = FALSE,
  transformed = TRUE,
  loc_alg = "LAD",
  related = FALSE,
  cov.structure = "corCompSymm",
  clust = NULL,
  genotypic = FALSE,
  origLev = FALSE,
  centre = "median",
  XchrMethod = 3
)

Arguments

`GENO`	a list of a genotype matrix/vector of SNPs, must contain values 0, 1, 2's coded for the number of reference allele. Alternatively, for imputed genotypes, it could either be a vector of dosage values between 0 and 2, or a list of matrix of genotype probabilities, numerically between 0 and 1 for each genotype. The length/dimension of `GENO` should match that of `Y`, and/or `SEX` and `COVAR`.
`Y`	a vector of quantitative traits, such as human height.
`COVAR`	optional: a vector or matrix of covariates that are used to reduce bias due to confounding, such as age.
`SEX`	optional: the genetic sex of individuals in the sample population, must be a vector of 1 and 2 following the default sex code is 1 for males and 2 for females in PLINK.
`Xchr`	a logical indicator for whether the analysis is for X-chromosome SNPs.
`transformed`	a logical indicating whether the quantitative response `Y` should be transformed using a rank-based method to resemble a normal distribution; recommended for traits with non-symmetric distribution. The default option is `TRUE`.
`loc_alg`	a character indicating the type of algorithm to compute the centre in stage 1; the value is either "OLS", corresponding to an ordinary linear regression under Gaussian assumptions to compute the mean, or "LAD", corresponding to a quantile regression to compute the median. The recommended default option is "LAD". For the quantile regression, the function calls `quantreg::rq` and the median is estimated using either the "br" (smaller samples) or "sfn" (larger samples and sparse problems) algorithm depending the sample size, for more details see `?quantreg::rq`.
`related`	optional: a logical indicating whether the samples should be treated as related; if `TRUE` while no relatedness covariance information is given, it is then estimated under a `cov.structure` and assumes this structure among all within-group errors pertaining to the same pair/cluster if specified using `clust`. This option currently only applies to autosomal SNPs.
`cov.structure`	optional: should be one of standard classes of correlation structures listed in `corClasses` from R package nlme. See `?corClasses`. The most commonly used option is `corCompSymm` for a compound symmetric correlation structure. This option currently only applies to autosomal SNPs.
`clust`	optional: a factor indicating the grouping of samples; it should have at least two distinct values. It could be the family ID (FID) for family studies. This option currently only applies to autosomal SNPs.
`genotypic`	a logical indicating whether the variance homogeneity should be tested with respect to an additively (linearly) coded or non-additively coded `geno_one`. The former has one less degree of freedom than the latter and is the default option. For dosage genotypes without genotypic probabilities, `genotypic` is forced to be `FALSE`.
`origLev`	a logical indicator for whether the reported p-values should also include original Levene's test.
`centre`	a character indicating whether the absolute deviation should be calculated with respect to "median" or "mean" in the traditional sex-specific and Fisher combined Levene's test p-values (three tests) for X-chromosome. The default value is "median". This option applies to sex-specific analysis using original Levene's test (i.e. when `regression`$$=$$`TRUE`).
`XchrMethod`	an integer taking values 0 (reports all models), 1.1, 1.2, 2, 3, for the choice of X-chromosome location association testing models; for more details, see `locReg`.

Value

a vector of location, scale and combined gJLS p-values for each SNP.

Note

For a genome-scan, we recommend to run this in PLINK via the plugin function gJLSPLINK, especially for large datasets and those with more than 20 covariates.

We highly recommend to quantile-normally transform Y for non-symmetrically distributed traits. This is typically done to avoid ‘scale-effect’ when the variance values tend to be proportional to mean values when stratified by GENO, as observed by Pare et al. (2010) and Yang et al. (2011).

For the moment, only quantitative trait Y is accepted as the subsequent generalized joint location scale (gJLS) analyses require the variance be calculated on quantitative traits. However, we are working on to include binary response for the generalized JLS analyses in the next update of gJLS.

Author(s)

Wei Q. Deng deng@utstat.toronto.edu, Lei Sun sun@utstat.toronto.edu

References

Soave D, Corvol H, Panjwani N, Gong J, Li W, Boëlle PY, Durie PR, Paterson AD, Rommens JM, Strug LJ, Sun L. (2015). A Joint Location-Scale Test Improves Power to Detect Associated SNPs, Gene Sets, and Pathways. American Journal of Human Genetics. 2015 Jul 2;97(1):125-38. doi: 10.1016/j.ajhg.2015.05.015. PMID: 26140448; PMCID: PMC4572492.

Examples

N <- 1000
genDAT <- rbinom(N, 2, 0.3)
sex <- rbinom(N, 1, 0.5)+1
y <- rnorm(N)
covar <- matrix(rnorm(N*10), ncol=10)

gJLS2(GENO=data.frame("SNP1" = genDAT, "aSNP1" = genDAT), SEX=sex, Y=y, COVAR=covar)

gJLS2(GENO=genDAT, SEX=sex, Y=y, COVAR=covar , Xchr=TRUE)

[Package gJLS2 version 0.2.0 Index]