GWAS {BGData} | R Documentation |

Implements single marker regressions. The regression model includes all the
covariates specified in the right-hand-side of the `formula`

plus one
column of the genotypes at a time. The data from the association tests is
obtained from a `BGData`

object.

```
GWAS(formula, data, method = "lsfit", i = seq_len(nrow(geno(data))),
j = seq_len(ncol(geno(data))), chunkSize = 5000L,
nCores = getOption("mc.cores", 2L), verbose = FALSE, ...)
```

`formula` |
The formula for the GWAS model without the variant, e.g. |

`data` |
A |

`method` |
The regression method to be used. Currently, the following methods are
implemented: |

`i` |
Indicates which rows of the genotypes should be used. Can be integer, boolean, or character. By default, all rows are used. |

`j` |
Indicates which columns of the genotypes should be used. Can be integer, boolean, or character. By default, all columns are used. |

`chunkSize` |
The number of columns of the genotypes that are brought into physical
memory for processing per core. If |

`nCores` |
The number of cores (passed to |

`verbose` |
Whether progress updates will be posted. Defaults to |

`...` |
Additional arguments for chunkedApply and regression method. |

The `rayOLS`

method is a regression through the origin that can only
be used with a `y ~ 1`

formula, i.e. it only allows for one
quantitative response variable `y`

and one variant at a time as an
explanatory variable (the variant is not included in the formula, hence
`1`

is used as a dummy). If covariates are needed, consider
preadjustment of `y`

. Among the provided methods, it is by far the
fastest.

Some regression methods may require the data to not contain columns with
variance 0 or too many missing values. We suggest running `summarize`

to detect variants that do not clear the desired minor-allele frequency and
rate of missing genotype calls, and filtering these variants out using the
`j`

parameter of the `GWAS`

function (see example below).

The same matrix that would be returned by `coef(summary(model))`

.

`file-backed-matrices`

for more information on file-backed
matrices. `multi-level-parallelism`

for more information on
multi-level parallelism. `BGData-class`

for more information on
the `BGData`

class. `lsfit`

,
`lm`

, `lm.fit`

,
`glm`

, `lmer`

, and
`SKAT`

for more information on regression methods.

```
# Restrict number of cores to 1 on Windows
if (.Platform$OS.type == "windows") {
options(mc.cores = 1)
}
# Load example data
bg <- BGData:::loadExample()
# Detect variants that do not pass MAF and missingness thresholds
summaries <- summarize(geno(bg))
maf <- ifelse(summaries$allele_freq > 0.5, 1 - summaries$allele_freq,
summaries$allele_freq)
exclusions <- maf < 0.01 | summaries$freq_na > 0.05
# Perform a single marker regression
res1 <- GWAS(formula = FT10 ~ 1, data = bg, j = !exclusions)
# Draw a Manhattan plot
plot(-log10(res1[, 4]))
# Use lm instead of lsfit (the default)
res2 <- GWAS(formula = FT10 ~ 1, data = bg, method = "lm", j = !exclusions)
# Use glm instead of lsfit (the default)
y <- pheno(bg)$FT10
pheno(bg)$FT10.01 <- y > quantile(y, 0.8, na.rm = TRUE)
res3 <- GWAS(formula = FT10.01 ~ 1, data = bg, method = "glm", j = !exclusions)
# Perform a single marker regression on the first 50 markers (useful for
# distributed computing)
res4 <- GWAS(formula = FT10 ~ 1, data = bg, j = 1:50)
```

[Package *BGData* version 2.4.0 Index]