GWAS {BGData} | R Documentation |
Performs Single Marker Regressions Using BGData Objects
Description
Implements single marker regressions. The regression model includes all the
covariates specified in the right-hand-side of the formula
plus one
column of the genotypes at a time. The data from the association tests is
obtained from a BGData
object.
Usage
GWAS(formula, data, method = "lsfit", i = seq_len(nrow(geno(data))),
j = seq_len(ncol(geno(data))), chunkSize = 5000L,
nCores = getOption("mc.cores", 2L), verbose = FALSE, ...)
Arguments
formula |
The formula for the GWAS model without the variant, e.g. |
data |
A |
method |
The regression method to be used. Currently, the following methods are
implemented: |
i |
Indicates which rows of the genotypes should be used. Can be integer, boolean, or character. By default, all rows are used. |
j |
Indicates which columns of the genotypes should be used. Can be integer, boolean, or character. By default, all columns are used. |
chunkSize |
The number of columns of the genotypes that are brought into physical
memory for processing per core. If |
nCores |
The number of cores (passed to |
verbose |
Whether progress updates will be posted. Defaults to |
... |
Additional arguments for chunkedApply and regression method. |
Details
The rayOLS
method is a regression through the origin that can only
be used with a y ~ 1
formula, i.e. it only allows for one
quantitative response variable y
and one variant at a time as an
explanatory variable (the variant is not included in the formula, hence
1
is used as a dummy). If covariates are needed, consider
preadjustment of y
. Among the provided methods, it is by far the
fastest.
Some regression methods may require the data to not contain columns with
variance 0 or too many missing values. We suggest running summarize
to detect variants that do not clear the desired minor-allele frequency and
rate of missing genotype calls, and filtering these variants out using the
j
parameter of the GWAS
function (see example below).
Value
The same matrix that would be returned by coef(summary(model))
.
See Also
file-backed-matrices
for more information on file-backed
matrices. multi-level-parallelism
for more information on
multi-level parallelism. BGData-class
for more information on
the BGData
class. lsfit
,
lm
, lm.fit
,
glm
, lmer
, and
SKAT
for more information on regression methods.
Examples
# Restrict number of cores to 1 on Windows
if (.Platform$OS.type == "windows") {
options(mc.cores = 1)
}
# Load example data
bg <- BGData:::loadExample()
# Detect variants that do not pass MAF and missingness thresholds
summaries <- summarize(geno(bg))
maf <- ifelse(summaries$allele_freq > 0.5, 1 - summaries$allele_freq,
summaries$allele_freq)
exclusions <- maf < 0.01 | summaries$freq_na > 0.05
# Perform a single marker regression
res1 <- GWAS(formula = FT10 ~ 1, data = bg, j = !exclusions)
# Draw a Manhattan plot
plot(-log10(res1[, 4]))
# Use lm instead of lsfit (the default)
res2 <- GWAS(formula = FT10 ~ 1, data = bg, method = "lm", j = !exclusions)
# Use glm instead of lsfit (the default)
y <- pheno(bg)$FT10
pheno(bg)$FT10.01 <- y > quantile(y, 0.8, na.rm = TRUE)
res3 <- GWAS(formula = FT10.01 ~ 1, data = bg, method = "glm", j = !exclusions)
# Perform a single marker regression on the first 50 markers (useful for
# distributed computing)
res4 <- GWAS(formula = FT10 ~ 1, data = bg, j = 1:50)