R: Location (mean-based association) test

locReg {gJLS2}

R Documentation

Location (mean-based association) test

Description

This function takes as input the genotype of SNPs (GENO), the SEX (SEX), and a quantitative trait (Y) in a sample population, and possibly additional covariates, such as principal components. The function returns the location association p-values for each SNP.

Usage

locReg(
  GENO,
  Y,
  SEX = NULL,
  COVAR = NULL,
  Xchr = FALSE,
  XchrMethod = 3,
  transformed = FALSE,
  related = FALSE,
  cov.structure = "corCompSymm",
  clust = NULL
)

Arguments

`GENO`	a list of a genotype matrix/vector of SNPs, must contain values 0, 1, 2's coded for the number of reference allele. Alternatively, for imputed genotypes, it could either be a vector of dosage values between 0 and 2, or a list of matrix of genotype probabilities, numerically between 0 and 1 for each genotype. The length/dimension of `GENO` should match that of `Y`, and/or `SEX` and `COVAR`.
`Y`	a numeric vector of quantitative trait, such as human height.
`SEX`	the genetic sex of individuals in the sample population, must be a vector of 1's and 2's following PLINK default coding, where males are coded as 1 and females 2. Optional for analysis of autosomal SNPs, but required for X-chromosome.
`COVAR`	optional: a vector or a matrix of covariates, such as age or principal components.
`Xchr`	a logical indicator for whether the analysis is for X-chromosome SNPs, if `TRUE` then the following association testing model is used: Y~G+G_D+S+GxS; with p-value given by comparing Y~G+S+GxS vs. Y~S (G is the additive coded genotype; G_D is an indicator for female heterozygotes).
`XchrMethod`	an integer taking values 0 (reports all models), 1.1, 1.2, 2, 3, for the choice of X-chromosome association testing models: model 1,1: Y~G (females only) model 1.2: Y~G (males only) model 2: Y~G+S+GxSex; with p-value given by comparing Y~G+Sex+GxSex vs. Y~Sex (the additively coded G is robust to X-chromosome inactivation uncertainty). This is also the option for dosage genotypes. model 3 (recommended): Y~G+G_D+S+GxSex; with p-value given by comparing Y ~ G+G_D+Sex+GxSex vs. Y ~ Sex (G_D is an indicator for female heterozygotes, this model is robust to X-chromosome inactivation uncertainty and skewed inactivation). For imputed data in the form of genotypic probabilities, the model becomes: Y ~ G1 + G2 + G1xSex + Sex, where G1 and G2 are the genotypic probabilities for the heterozygote and alternative allele homozygote.
`transformed`	a logical indicating whether the quantitative response `Y` should be transformed using a rank-based method to resemble a normal distribution; recommended for traits with non-symmetric distribution. The default option is `FALSE`.
`related`	optional: a logical indicating whether the samples should be treated as related; if `TRUE` while no relatedness covariance information is given, it is then estimated under a `cov.structure` and assumes this structure among all within-group errors pertaining to the same pair/cluster if specified using `clust`. This option currently only applies to autosomal SNPs.
`cov.structure`	optional: should be one of standard classes of correlation structures listed in `corClasses` from R package nlme. See `?corClasses`. The most commonly used option is `corCompSymm` for a compound symmetric correlation structure. This option currently only applies to autosomal SNPs.
`clust`	optional: a factor indicating the grouping of samples; it should have at least two distinct values. It could be the family ID (FID) for family studies. This option currently only applies to autosomal SNPs.

Value

a vector of location association p-values for each SNP.

Note

The choice to use a rank-based inverse normal transformation is left to the user's discretion. See XXX for a discussion on the pros and cons of quantile transformation with respect to location association.

For X-chromosome markers, when the samples consist entirely of females or males, we report only results from model 1, regardless of the XchrMethod option.

Author(s)

Wei Q. Deng deng@utstat.toronto.edu, Lei Sun sun@utstat.toronto.edu

References

Chen B, Craiu RV, Sun L. (2020) Bayesian model averaging for the X-chromosome inactivation dilemma in genetic association study. Biostatistics. 21(2):319-335. doi: 10.1093/biostatistics/kxy049. PMID: 30247537.

Chen B, Craiu RV, Strug LJ, Sun L. (2021) The X factor: A robust and powerful approach to X-chromosome-inclusive whole-genome association studies. Genetic Epidemiology. doi: 10.1002/gepi.22422. PMID: 34224641.

Examples

N <- 100
genDAT <- rbinom(N, 2, 0.3)
sex <- rbinom(N, 1, 0.5)+1
y <- rnorm(N)
COVAR <- matrix(rnorm(N*10), ncol=10)

locReg(GENO=genDAT, SEX=sex, Y=y, COVAR=COVAR)

# correlated example:
library("MASS")
yy <- mvrnorm(1, mu= rep(0, N), Sigma = matrix(0.3, N, N) + diag(0.7, N))
locReg(GENO=list("SNP1"= genDAT, "SNP2" = genDAT[sample(1:100)]),
SEX=sex, Y=as.numeric(yy), COVAR=COVAR, related = TRUE,
clust = rep(1, 100))

# sibpair example:
pairedY <- mvrnorm(N/2,rep(0,2),matrix(c(1,0.2,0.2,1), 2))
yy <- c(pairedY[,1], pairedY[,2])
locReg(GENO=list("SNP1"= genDAT, "SNP2" = genDAT[sample(1:100)]),
SEX=sex, Y=as.numeric(yy), COVAR=COVAR, related = TRUE,
clust = rep(c(1:50), 2))

# Xchr data example:
genDAT1 <- rep(NA, N)
genDAT1[sex==1] <- rbinom(sum(sex==1), 1, 0.5)
genDAT1[sex==2] <-rbinom(sum(sex==2), 2, 0.5)
locReg(GENO=genDAT1, SEX=sex, Y=y, COVAR=COVAR, Xchr=TRUE)

[Package gJLS2 version 0.2.0 Index]