R: The generalized Levene's test via a two-stage regression for...

leveneRegA_per_SNP {gJLS2}

R Documentation

The generalized Levene's test via a two-stage regression for variance homogeneity by SNP genotype (autosomes)

Description

This function takes as input the genotype of a SNP (GENO), a quantitative trait (Y) in a sample population, and possibly additional covariates, such as principal components. The function returns the scale association p-values for each autosomal SNP using the generalized Levene's test.

Usage

leveneRegA_per_SNP(
  geno_one,
  Y,
  COVAR = NULL,
  transformed = TRUE,
  loc_alg = "LAD",
  related = FALSE,
  cov.structure = "corCompSymm",
  clust = NULL,
  genotypic = FALSE
)

Arguments

`geno_one`	the genotype of a biallelic SNP, must be a vector of 0, 1, 2's coded for the number of reference allele. Alternatively, for imputed genotypes, it could be a matrix/vector of dosage values, numerically between 0 and 2. The length/dimension of `geno_one` should match that of `Y`, and/or `SEX` and `COVAR`.
`Y`	a vector of quantitative traits, such as human height.
`COVAR`	optional: a vector or matrix of covariates that are used to reduce bias due to confounding, such as age.
`transformed`	a logical indicating whether the quantitative response `Y` should be transformed using a rank-based method to resemble a normal distribution; recommended for traits with non-symmetric distribution. The default option is `TRUE`.
`loc_alg`	a character indicating the type of algorithm to compute the centre in stage 1; the value is either "OLS", corresponding to an ordinary linear regression under Gaussian assumptions to compute the mean, or "LAD", corresponding to a quantile regression to compute the median. The recommended default option is "LAD". For the quantile regression, the function calls `quantreg::rq` and the median is estimated using either the "fn" (smaller samples) or "sfn" (larger samples and sparse problems) algorithm depending the sample size, for more details see `?quantreg::rq`.
`related`	optional: a logical indicating whether the samples should be treated as related; if `TRUE` while no relatedness covariance information is given, it is then estimated under a `cov.structure` and assumes this structure among all within-group errors pertaining to the same pair/cluster if specified using `clust`. This option currently only applies to autosomal SNPs.
`cov.structure`	optional: should be one of standard classes of correlation structures listed in `corClasses` from R package nlme. See `?corClasses`. The most commonly used option is `corCompSymm` for a compound symmetric correlation structure. This option currently only applies to autosomal SNPs.
`clust`	optional: a factor indicating the grouping of samples; it should have at least two distinct values. It could be the family ID (FID) for family studies. This option currently only applies to autosomal SNPs.
`genotypic`	optional: a logical indicating whether the variance homogeneity should be tested with respect to an additively (linearly) coded or non-additively coded `geno_one`. The former has one less degree of freedom than the latter and is the default option. For dosage data without genotypic probabilities, `genotypic` is forced to be `FALSE`.

Value

Levene's test regression p-values for autosomal SNPs according to the model specified.

Note

We recommend to quantile-normally transform Y to avoid ‘scale-effect’ where the variance values tend to be proportional to mean values when stratified by geno_one.

When the relatedness option is used, the computational time is expected to be longer for larger sample size ($$n > 1000$$), thus we recommend this option for smaller studies rather than large population based studies.

There is no explicit argument to supply SEX for autosomal SNPs, the user can choose to include the genetic sex of individuals as a column of the COVAR argument.

Author(s)

Wei Q. Deng deng@utstat.toronto.edu, Lei Sun sun@utstat.toronto.edu

References

Soave D, Sun L. (2017). A generalized Levene's scale test for variance heterogeneity in the presence of sample correlation and group uncertainty. Biometrics. 73(3):960-971. doi: 10.1111/biom.12651. PMID: 28099998.

Gastwirth JL, Gel YR, Miao W. (2009). The Impact of Levene's Test of Equality of Variances on Statistical Theory and Practice." Statistical Science. 24(3) 343 - 360, doi: 10.1214/09-STS301

Examples

N <- 100
genDAT <- rbinom(N, 2, 0.3)
Y <- rnorm(N)
covar <- matrix(rnorm(N*10), ncol=10)

# vanilla example:
leveneRegA_per_SNP(geno_one=genDAT, Y=Y, COVAR=covar)

# relatedness samples:
leveneRegA_per_SNP(geno_one=genDAT, Y=Y, COVAR=covar,
related=TRUE)
leveneRegA_per_SNP(geno_one=genDAT, Y=Y, COVAR=covar,
related=TRUE, clust = factor(rbinom(N, 2, 0.6)))


# dosage genotypes example:
library("MCMCpack")
a <- 0.3
geno <- rbinom(N, 2, 0.3)
a <- 0.3 ## uncertainty
genPP <- rbind(rdirichlet(sum(geno==0),c(a,(1-a)/2,(1-a)/2)),
               rdirichlet(sum(geno==1),c((1-a)/2,a,(1-a)/2)),
               rdirichlet(sum(geno==2),c((1-a)/2,(1-a)/2,a)))

leveneRegA_per_SNP(geno_one=genPP, Y=Y, COVAR=covar)
leveneRegA_per_SNP(geno_one=genPP, Y=Y, COVAR=covar,
genotypic=TRUE)

# dosage and related samples:
leveneRegA_per_SNP(geno_one=genPP, Y=Y, COVAR=covar,
related=TRUE, clust = factor(rbinom(N, 1, 0.3)))
leveneRegA_per_SNP(geno_one=genPP, Y=Y, COVAR=covar,
related=TRUE, clust = factor(rbinom(N, 1, 0.3)), genotypic=TRUE)

[Package gJLS2 version 0.2.0 Index]