runSingleTraitGwas {statgenGWAS} | R Documentation |
Perform single-trait GWAS
Description
runSingleTraitGwas
performs a single-trait Genome Wide Association
Study (GWAS) on phenotypic and genotypic data contained in a gData
object. A covariance matrix is computed using the EMMA algorithm (Kang et
al., 2008) or the Newton-Raphson algorithm (Tunnicliffe, 1989) in the
sommer
package. Then a Generalized Least Squares (GLS) method is used
for estimating the marker effects and corresponding p-values. This is done
using either one kinship matrix for all chromosomes or different
chromosome-specific kinship matrices for each chromosome. Significant SNPs
are selected based on a user defined threshold.
Usage
runSingleTraitGwas(
gData,
traits = NULL,
trials = NULL,
covar = NULL,
snpCov = NULL,
kin = NULL,
kinshipMethod = c("astle", "IBS", "vanRaden"),
remlAlgo = c("EMMA", "NR"),
GLSMethod = c("single", "multi"),
useMAF = TRUE,
MAF = 0.01,
MAC = 10,
genomicControl = FALSE,
thrType = c("bonf", "fixed", "small", "fdr"),
alpha = 0.05,
LODThr = 4,
nSnpLOD = 10,
pThr = 0.05,
rho = 0.5,
sizeInclRegion = 0,
minR2 = 0.5,
nCores = NULL
)
Arguments
gData |
An object of class |
traits |
A vector of traits on which to run GWAS. These can be either
numeric indices or character names of columns in |
trials |
A vector of trials on which to run GWAS. These can
be either numeric indices or character names of list items in |
covar |
An optional vector of covariates taken into account when
running GWAS. These can be either numeric indices or character names of
columns in |
snpCov |
An optional character vector of snps to be included as covariates. |
kin |
An optional kinship matrix or list of kinship matrices. These
matrices can be from the |
kinshipMethod |
An optional character indicating the method used for
calculating the kinship matrix(ces). Currently "astle" (Astle and Balding,
2009), "IBS" and "vanRaden" (VanRaden, 2008) are supported. If a
kinship matrix is supplied either in |
remlAlgo |
A character string indicating the algorithm used to estimate
the variance components. Either |
GLSMethod |
A character string indicating the method used to estimate
the marker effects. Either |
useMAF |
Should the minor allele frequency be used for selecting SNPs
for the analysis. If |
MAF |
The minor allele frequency (MAF) threshold used in GWAS. A
numerical value between 0 and 1. SNPs with MAF below this value are not taken
into account in the analysis, i.e. p-values and effect sizes are put to
missing ( |
MAC |
A numerical value. SNPs with minor allele count below this value
are not taken into account for the analysis, i.e. p-values and effect sizes
are set to missing ( |
genomicControl |
Should genomic control correction as in Devlin and Roeder (1999) be applied? |
thrType |
A character string indicating the type of threshold used for the selection of candidate loci. See Details. |
alpha |
A numerical value used for calculating the LOD-threshold for
|
LODThr |
A numerical value used as a LOD-threshold when
|
nSnpLOD |
A numerical value indicating the number of SNPs with the
smallest p-values that are selected when |
pThr |
A numerical value just as the cut off value for p-Values for
|
rho |
A numerical value used a the minimum value for SNPs to be
considered correlated when using |
sizeInclRegion |
An integer. Should the results for SNPs close to significant SNPs be included? If so, the size of the region in centimorgan or base pairs. Otherwise 0. |
minR2 |
A numerical value between 0 and 1. Restricts the SNPs included
in the region close to significant SNPs to only those SNPs that are in
sufficient Linkage Disequilibrium (LD) with the significant snp, where LD
is measured in terms of |
nCores |
A numerical value indicating the number of cores to be used by
the parallel part of the algorithm. If |
Value
An object of class GWAS
.
Threshold types for the selection of candidate loci
For the selection of candidate loci from the GWAS output four different
methods can be used. The method used can be specified in the function
parameter thrType
. Further parameters can be used to fine tune the
method.
- bonf
The Bonferroni threshold, a LOD-threshold of
-log10(alpha / m)
, where m is the number of SNPs and alpha can be specified by the function parameteralpha
- fixed
A fixed LOD-threshold, specified by the function parameter
LODThr
- small
The n SNPs with with the smallest p-Values are selected. n can be specified in
nSnpLOD
- fdr
Following the algorithm proposed by Brzyski D. et al. (2017), SNPs are selected in such a way that the False Discovery Rate (fdr) is minimized. To do this, first the SNPs are restricted to the SNPs with a p-Value below
pThr
. Then clusters of SNPs are created using a two step iterative process in which first the SNP with the lowest p-Value is selected as cluster representative. This SNP and all SNPs that have a correlation with this SNP of\rho
or higher will form a cluster.\rho
can be specified as an argument in the function and has a default value of 0.5, which is a recommended starting value in practice. The selected SNPs are removed from the list and the procedure repeated until no SNPs are left. Finally to determine the number of significant clusters the first cluster is determined for which the p-Value of the cluster representative is not smaller thancluster_{number} * \alpha / m
where m is the number of SNPs and alpha can be specified by the function parameteralpha
. All previous clusters are selected as significant.
References
Astle, William, and David J. Balding. 2009. Population Structure and Cryptic Relatedness in Genetic Association Studies. Statistical Science 24 (4): 451–71. doi:10.1214/09-sts307.
Brzyski D. et al. (2017) Controlling the Rate of GWAS False Discoveries. Genetics 205 (1): 61-75. doi:10.1534/genetics.116.193987
Devlin, B., and Kathryn Roeder. 1999. Genomic Control for Association Studies. Biometrics 55 (4): 997–1004. doi:10.1111/j.0006-341x.1999.00997.x.
Kang et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics 178 (3): 1709–23. doi:10.1534/genetics.107.080101.
Millet, E. J., Pommier, C., et al. (2019). A multi-site experiment in a network of European fields for assessing the maize yield response to environmental scenarios - Data set. doi:10.15454/IASSTN
Rincent et al. (2014) Recovering power in association mapping panels with variable levels of linkage disequilibrium. Genetics 197 (1): 375–87. doi:10.1534/genetics.113.159731.
Segura et al. (2012) An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nature Genetics 44 (7): 825–30. doi:10.1038/ng.2314.
Sun et al. (2010) Variation explained in mixed-model association mapping. Heredity 105 (4): 333–40. doi:10.1038/hdy.2010.11.
Tunnicliffe W. (1989) On the use of marginal likelihood in time series model estimation. JRSS 51 (1): 15–27.
VanRaden P.M. (2008) Efficient methods to compute genomic predictions. Journal of Dairy Science 91 (11): 4414–23. doi:10.3168/jds.2007-0980.
See Also
Examples
## Create a gData object Using the data from the DROPS project.
## See the included vignette for a more extensive description on the steps.
data(dropsMarkers)
data(dropsMap)
data(dropsPheno)
## Add genotypes as row names of dropsMarkers and drop Ind column.
rownames(dropsMarkers) <- dropsMarkers[["Ind"]]
dropsMarkers <- dropsMarkers[colnames(dropsMarkers) != "Ind"]
## Add genotypes as row names of dropsMap.
rownames(dropsMap) <- dropsMap[["SNP.names"]]
## Rename Chomosome and Position columns.
colnames(dropsMap)[match(c("Chromosome", "Position"),
colnames(dropsMap))] <- c("chr", "pos")
## Convert phenotypic data to a list.
dropsPhenoList <- split(x = dropsPheno, f = dropsPheno[["Experiment"]])
## Rename Variety_ID to genotype and select relevant columns.
dropsPhenoList <- lapply(X = dropsPhenoList, FUN = function(trial) {
colnames(trial)[colnames(trial) == "Variety_ID"] <- "genotype"
trial <- trial[c("genotype", "grain.yield", "grain.number", "seed.size",
"anthesis", "silking", "plant.height", "tassel.height",
"ear.height")]
return(trial)
})
## Create gData object.
gDataDrops <- createGData(geno = dropsMarkers, map = dropsMap,
pheno = dropsPhenoList)
## Run single trait GWAS for trait 'grain.yield' for trial Mur13W.
GWASDrops <- runSingleTraitGwas(gData = gDataDrops,
trials = "Mur13W",
traits = "grain.yield")
## Run single trait GWAS for trait 'grain.yield' for trial Mur13W.
## Use chromosome specific kinship matrices calculated using vanRaden method.
GWASDropsMult <- runSingleTraitGwas(gData = gDataDrops,
trials = "Mur13W",
traits = "grain.yield",
kinshipMethod = "vanRaden",
GLSMethod = "multi")