R: Performs Single Marker Regressions Using BGData Objects

GWAS {BGData}

R Documentation

Performs Single Marker Regressions Using BGData Objects

Description

Implements single marker regressions. The regression model includes all the covariates specified in the right-hand-side of the formula plus one column of the genotypes at a time. The data from the association tests is obtained from a BGData object.

Usage

GWAS(formula, data, method = "lsfit", i = seq_len(nrow(geno(data))),
  j = seq_len(ncol(geno(data))), chunkSize = 5000L,
  nCores = getOption("mc.cores", 2L), verbose = FALSE, ...)

Arguments

`formula`	The formula for the GWAS model without the variant, e.g. `y ~ 1` or `y ~ factor(sex) + age`. The variables included in the formula must be column names in the sample information of the `BGData` object.
`data`	A `BGData` object.
`method`	The regression method to be used. Currently, the following methods are implemented: `rayOLS` (see below), `lsfit`, `lm`, `lm.fit`, `glm`, `lmer`, and `SKAT`. Defaults to `lsfit`.
`i`	Indicates which rows of the genotypes should be used. Can be integer, boolean, or character. By default, all rows are used.
`j`	Indicates which columns of the genotypes should be used. Can be integer, boolean, or character. By default, all columns are used.
`chunkSize`	The number of columns of the genotypes that are brought into physical memory for processing per core. If `NULL`, all elements in `j` are used. Defaults to 5000.
`nCores`	The number of cores (passed to `mclapply`). Defaults to the number of cores as detected by `detectCores`.
`verbose`	Whether progress updates will be posted. Defaults to `FALSE`.
`...`	Additional arguments for chunkedApply and regression method.

Details

The rayOLS method is a regression through the origin that can only be used with a y ~ 1 formula, i.e. it only allows for one quantitative response variable y and one variant at a time as an explanatory variable (the variant is not included in the formula, hence 1 is used as a dummy). If covariates are needed, consider preadjustment of y. Among the provided methods, it is by far the fastest.

Some regression methods may require the data to not contain columns with variance 0 or too many missing values. We suggest running summarize to detect variants that do not clear the desired minor-allele frequency and rate of missing genotype calls, and filtering these variants out using the j parameter of the GWAS function (see example below).

Value

The same matrix that would be returned by coef(summary(model)).

Examples

# Restrict number of cores to 1 on Windows
if (.Platform$OS.type == "windows") {
    options(mc.cores = 1)
}

# Load example data
bg <- BGData:::loadExample()

# Detect variants that do not pass MAF and missingness thresholds
summaries <- summarize(geno(bg))
maf <- ifelse(summaries$allele_freq > 0.5, 1 - summaries$allele_freq,
    summaries$allele_freq)
exclusions <- maf < 0.01 | summaries$freq_na > 0.05

# Perform a single marker regression
res1 <- GWAS(formula = FT10 ~ 1, data = bg, j = !exclusions)

# Draw a Manhattan plot
plot(-log10(res1[, 4]))

# Use lm instead of lsfit (the default)
res2 <- GWAS(formula = FT10 ~ 1, data = bg, method = "lm", j = !exclusions)

# Use glm instead of lsfit (the default)
y <- pheno(bg)$FT10
pheno(bg)$FT10.01 <- y > quantile(y, 0.8, na.rm = TRUE)
res3 <- GWAS(formula = FT10.01 ~ 1, data = bg, method = "glm", j = !exclusions)

# Perform a single marker regression on the first 50 markers (useful for
# distributed computing)
res4 <- GWAS(formula = FT10 ~ 1, data = bg, j = 1:50)

[Package BGData version 2.4.1 Index]