RGWAS.twostep.epi {RAINBOWR} | R Documentation |
Perform normal GWAS (genome-wide association studies) first, then check epistatic effects for relatively significant markers
Description
Perform normal GWAS (genome-wide association studies) first, then check epistatic effects for relatively significant markers
Usage
RGWAS.twostep.epi(
pheno,
geno,
ZETA = NULL,
package.MM = "gaston",
covariate = NULL,
covariate.factor = NULL,
structure.matrix = NULL,
n.PC = 0,
min.MAF = 0.02,
n.core = 1,
parallel.method = "mclapply",
check.size.epi = 4,
epistasis.percent = 0.05,
check.epi.max = 200,
your.check = NULL,
GWAS.res.first = NULL,
P3D = TRUE,
test.method = "LR",
dominance.eff = TRUE,
skip.self.int = FALSE,
haplotype = TRUE,
num.hap = NULL,
optimizer = "nlminb",
window.size.half = 5,
window.slide = 1,
chi0.mixture = 0.5,
gene.set = NULL,
map.gene.set = NULL,
sig.level = 0.05,
method.thres = "BH",
plot.qq.1 = TRUE,
plot.Manhattan.1 = TRUE,
plot.epi.3d = TRUE,
plot.epi.2d = TRUE,
plot.method = 1,
plot.col1 = c("dark blue", "cornflowerblue"),
plot.col2 = 1,
plot.type = "p",
plot.pch = 16,
saveName = NULL,
main.qq.1 = NULL,
main.man.1 = NULL,
main.epi.3d = NULL,
main.epi.2d = NULL,
skip.check = FALSE,
verbose = TRUE,
verbose2 = FALSE,
count = TRUE,
time = TRUE
)
Arguments
pheno |
Data frame where the first column is the line name (gid). The remaining columns should be a phenotype to test. |
geno |
Data frame with the marker names in the first column. The second and third columns contain the chromosome and map position. Columns 4 and higher contain the marker scores for each line, coded as [-1, 0, 1] = [aa, Aa, AA]. |
ZETA |
A list of covariance (relationship) matrix (K: ZETA = list(A = list(Z = Z.A, K = K.A), D = list(Z = Z.D, K = K.D))
For example, K.A is additive relationship matrix for the covariance between lines, and K.D is dominance relationship matrix. |
package.MM |
The package name to be used when solving mixed-effects model. We only offer the following three packages:
"RAINBOWR", "MM4LMM" and "gaston". Default package is 'gaston'.
See more details at |
covariate |
A |
covariate.factor |
A |
structure.matrix |
You can use structure matrix calculated by structure analysis when there are population structure. You should not use this argument with n.PC > 0. |
n.PC |
Number of principal components to include as fixed effects. Default is 0 (equals K model). |
min.MAF |
Specifies the minimum minor allele frequency (MAF). If a marker has a MAF less than min.MAF, it is assigned a zero score. |
n.core |
Setting n.core > 1 will enable parallel execution on a machine with multiple cores. This argument is not valid when 'parallel.method = "furrr"'. |
parallel.method |
Method for parallel computation. We offer three methods, "mclapply", "furrr", and "foreach". When 'parallel.method = "mclapply"', we utilize When 'parallel.method = "furrr"', we utilize When 'parallel.method = "foreach"', we utilize We recommend that you use the option 'parallel.method = "mclapply"', but for Windows users, this parallelization method is not supported. So, if you are Windows user, we recommend that you use the option 'parallel.method = "foreach"'. |
check.size.epi |
This argument determines how many SNPs (around the SNP detected by normal GWAS) you will check epistasis. |
epistasis.percent |
This argument determines how many SNPs are detected by normal GWAS. For example, when epistasis.percent = 0.1, SNPs whose value of -log10(p) is in the top 0.1 percent are chosen as candidate for checking epistasis. |
check.epi.max |
It takes a lot of time to check epistasis, so you can decide the maximum number of SNPs to check epistasis. |
your.check |
Because there are less SNPs that can be tested in epistasis than in kernel-based GWAS, you can select which SNPs you want to test. If you use this argument, please set the number where SNPs to be tested are located in your data (so not position). In the default setting, your_check = NULL and epistasis between SNPs detected by GWAS will be tested. |
GWAS.res.first |
If you have already performed regular GWAS and have the result, you can skip performing normal GWAS. |
P3D |
When P3D = TRUE, variance components are estimated by REML only once, without any markers in the model. When P3D = FALSE, variance components are estimated by REML for each marker separately. |
test.method |
RGWAS supports two methods to test effects of each SNP-set.
|
dominance.eff |
If this argument is TRUE, dominance effect is included in the model, and additive x dominance and dominance x dominance are also tested as epistatic effects. When you use inbred lines, please set this argument FALSE. |
skip.self.int |
As default, the function also tests the self-interactions among the same SNP-sets. If you want to avoid this, please set 'skip.self.int = TRUE'. |
haplotype |
If the number of lines of your data is large (maybe > 100), you should set haplotype = TRUE. When haplotype = TRUE, haplotype-based kernel will be used for calculating -log10(p). (So the dimension of this gram matrix will be smaller.) The result won't be changed, but the time for the calculation will be shorter. |
num.hap |
When haplotype = TRUE, you can set the number of haplotypes which you expect. Then similar arrays are considered as the same haplotype, and then make kernel(K.SNP) whose dimension is num.hap x num.hap. When num.hap = NULL (default), num.hap will be set as the maximum number which reflects the difference between lines. |
optimizer |
The function used in the optimization process. We offer "optim", "optimx", and "nlminb" functions. |
window.size.half |
This argument decides how many SNPs (around the SNP you want to test) are used to calculated K.SNP. More precisely, the number of SNPs will be 2 * window.size.half + 1. |
window.slide |
This argument determines how often you test markers. If window.slide = 1, every marker will be tested. If you want to perform SNP set by bins, please set window.slide = 2 * window.size.half + 1. |
chi0.mixture |
RAINBOWR assumes the deviance is considered to follow a x chisq(df = 0) + (1 - a) x chisq(df = r). where r is the degree of freedom. The argument chi0.mixture is a (0 <= a < 1), and default is 0.5. |
gene.set |
If you have information of gene (or haplotype block), you can use it to perform kernel-based GWAS. You should assign your gene information to gene.set in the form of a "data.frame" (whose dimension is (the number of gene) x 2). In the first column, you should assign the gene name. And in the second column, you should assign the names of each marker, which correspond to the marker names of "geno" argument. |
map.gene.set |
Genotype map for 'gene.set' (list of haplotype blocks).
This is a data.frame with the haplotype block (SNP-set, or gene-set) names in the first column.
The second and third columns contain the chromosome and map position for each block.
The forth column contains the cumulative map position for each block, which can be computed by |
sig.level |
Significance level for the threshold. The default is 0.05. |
method.thres |
Method for detemining threshold of significance. "BH" and "Bonferroni are offered. |
plot.qq.1 |
If TRUE, draw qq plot for normal GWAS. |
plot.Manhattan.1 |
If TRUE, draw manhattan plot for normal GWAS. |
plot.epi.3d |
If TRUE, draw 3d plot |
plot.epi.2d |
If TRUE, draw 2d plot |
plot.method |
If this argument = 1, the default manhattan plot will be drawn. If this argument = 2, the manhattan plot with axis based on Position (bp) will be drawn. Also, this plot's color is changed by all chromosomes. |
plot.col1 |
This argument determines the color of the manhattan plot. You should substitute this argument as color vector whose length is 2. plot.col1[1] for odd chromosomes and plot.col1[2] for even chromosomes |
plot.col2 |
Color of the manhattan plot. color changes with chromosome and it starts from plot.col2 + 1 (so plot.col2 = 1 means color starts from red.) |
plot.type |
This argument determines the type of the manhattan plot. See the help page of "plot". |
plot.pch |
This argument determines the shape of the dot of the manhattan plot. See the help page of "plot". |
saveName |
When drawing any plot, you can save plots in png format. In saveName, you should substitute the name you want to save. When saveName = NULL, the plot is not saved. |
main.qq.1 |
The title of qq plot for normal GWAS. If this argument is NULL, trait name is set as the title. |
main.man.1 |
The title of manhattan plot for normal GWAS. If this argument is NULL, trait name is set as the title. |
main.epi.3d |
The title of 3d plot. If this argument is NULL, trait name is set as the title. |
main.epi.2d |
The title of 2d plot. If this argument is NULL, trait name is set as the title. |
skip.check |
As default, RAINBOWR checks the type of input data and modifies it into the correct format. However, it will take some time, so if you prepare the correct format of input data, you can skip this procedure by setting 'skip.check = TRUE'. |
verbose |
If this argument is TRUE, messages for the current steps will be shown. |
verbose2 |
If this argument is TRUE, welcome message will be shown. |
count |
When count is TRUE, you can know how far RGWAS has ended with percent display. |
time |
When time is TRUE, you can know how much time it took to perform RGWAS. |
Value
- $first
The results of first normal GWAS will be returned.
- $map.epi
Map information for SNPs which are tested epistatic effects.
- $epistasis
-
- $scores
-
- $scores
This is the matrix which contains -log10(p) calculated by the test about epistasis effects.
- $x, $y
The information of the positions of SNPs detected by regular GWAS. These vectors are used when drawing plots. Each output correspond to the replication of row and column of scores.
- $z
This is a vector of $scores. This vector is also used when drawing plots.
References
Kennedy, B.W., Quinton, M. and van Arendonk, J.A. (1992) Estimation of effects of single genes on quantitative traits. J Anim Sci. 70(7): 2000-2012.
Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci. 100(16): 9440-9445.
Yu, J. et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 38(2): 203-208.
Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.
Kang, H.M. et al. (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 42(4): 348-354.
Zhang, Z. et al. (2010) Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 42(4): 355-360.
Endelman, J.B. (2011) Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. Plant Genome J. 4(3): 250.
Endelman, J.B. and Jannink, J.L. (2012) Shrinkage Estimation of the Realized Relationship Matrix. G3 Genes, Genomes, Genet. 2(11): 1405-1413.
Su, G. et al. (2012) Estimating Additive and Non-Additive Genetic Variances and Predicting Genetic Merits Using Genome-Wide Dense Single Nucleotide Polymorphism Markers. PLoS One. 7(9): 1-7.
Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.
Listgarten, J. et al. (2013) A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics. 29(12): 1526-1533.
Lippert, C. et al. (2014) Greater power and computational efficiency for kernel-based association testing of sets of genetic variants. Bioinformatics. 30(22): 3206-3214.
Jiang, Y. and Reif, J.C. (2015) Modeling epistasis in genomic selection. Genetics. 201(2): 759-768.
Examples
### Import RAINBOWR
require(RAINBOWR)
### Load example datasets
data("Rice_Zhao_etal")
Rice_geno_score <- Rice_Zhao_etal$genoScore
Rice_geno_map <- Rice_Zhao_etal$genoMap
Rice_pheno <- Rice_Zhao_etal$pheno
### View each dataset
See(Rice_geno_score)
See(Rice_geno_map)
See(Rice_pheno)
### Select one trait for example
trait.name <- "Flowering.time.at.Arkansas"
y <- Rice_pheno[, trait.name, drop = FALSE]
### Remove SNPs whose MAF <= 0.05
x.0 <- t(Rice_geno_score)
MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map)
x <- MAF.cut.res$x
map <- MAF.cut.res$map
### Estimate genomic relationship matrix (GRM)
K.A <- calcGRM(genoMat = x)
### Modify data
modify.data.res <- modify.data(pheno.mat = y, geno.mat = x, map = map,
return.ZETA = TRUE, return.GWAS.format = TRUE)
pheno.GWAS <- modify.data.res$pheno.GWAS
geno.GWAS <- modify.data.res$geno.GWAS
ZETA <- modify.data.res$ZETA
### View each data for RAINBOWR
See(pheno.GWAS)
See(geno.GWAS)
str(ZETA)
### Perform two-step epistasis GWAS (single-snp GWAS -> Check epistasis for significant markers)
twostep.epi.res <- RGWAS.twostep.epi(pheno = pheno.GWAS, geno = geno.GWAS, ZETA = ZETA,
n.PC = 4, test.method = "LR", gene.set = NULL,
window.size.half = 10, window.slide = 21,
package.MM = "gaston", parallel.method = "mclapply",
skip.check = TRUE, n.core = 2)
See(twostep.epi.res$epistasis$scores)