RGWAS.epistasis {RAINBOWR} | R Documentation |
Check epistatic effects by kernel-based GWAS (genome-wide association studies)
Description
Check epistatic effects by kernel-based GWAS (genome-wide association studies)
Usage
RGWAS.epistasis(
pheno,
geno,
ZETA = NULL,
package.MM = "gaston",
covariate = NULL,
covariate.factor = NULL,
structure.matrix = NULL,
n.PC = 0,
min.MAF = 0.02,
n.core = 1,
parallel.method = "mclapply",
test.method = "LR",
dominance.eff = TRUE,
skip.self.int = FALSE,
haplotype = TRUE,
num.hap = NULL,
window.size.half = 5,
window.slide = 1,
chi0.mixture = 0.5,
optimizer = "nlminb",
gene.set = NULL,
map.gene.set = NULL,
plot.epi.3d = TRUE,
plot.epi.2d = TRUE,
main.epi.3d = NULL,
main.epi.2d = NULL,
saveName = NULL,
skip.check = FALSE,
verbose = TRUE,
verbose2 = FALSE,
count = TRUE,
time = TRUE
)
Arguments
pheno |
Data frame where the first column is the line name (gid). The remaining columns should be a phenotype to test. |
geno |
Data frame with the marker names in the first column. The second and third columns contain the chromosome and map position. Columns 4 and higher contain the marker scores for each line, coded as [-1, 0, 1] = [aa, Aa, AA]. |
ZETA |
A list of covariance (relationship) matrix (K: ZETA = list(A = list(Z = Z.A, K = K.A), D = list(Z = Z.D, K = K.D))
For example, K.A is additive relationship matrix for the covariance between lines, and K.D is dominance relationship matrix. |
package.MM |
The package name to be used when solving mixed-effects model. We only offer the following three packages:
"RAINBOWR", "MM4LMM" and "gaston". Default package is 'gaston'.
See more details at |
covariate |
A |
covariate.factor |
A |
structure.matrix |
You can use structure matrix calculated by structure analysis when there are population structure. You should not use this argument with n.PC > 0. |
n.PC |
Number of principal components to include as fixed effects. Default is 0 (equals K model). |
min.MAF |
Specifies the minimum minor allele frequency (MAF). If a marker has a MAF less than min.MAF, it is assigned a zero score. |
n.core |
Setting n.core > 1 will enable parallel execution on a machine with multiple cores. This argument is not valid when 'parallel.method = "furrr"'. |
parallel.method |
Method for parallel computation. We offer three methods, "mclapply", "furrr", and "foreach". When 'parallel.method = "mclapply"', we utilize When 'parallel.method = "furrr"', we utilize When 'parallel.method = "foreach"', we utilize We recommend that you use the option 'parallel.method = "mclapply"', but for Windows users, this parallelization method is not supported. So, if you are Windows user, we recommend that you use the option 'parallel.method = "foreach"'. |
test.method |
RGWAS supports two methods to test effects of each SNP-set.
|
dominance.eff |
If this argument is TRUE, dominance effect is included in the model, and additive x dominance and dominance x dominance are also tested as epistatic effects. When you use inbred lines, please set this argument FALSE. |
skip.self.int |
As default, the function also tests the self-interactions among the same SNP-sets. If you want to avoid this, please set 'skip.self.int = TRUE'. |
haplotype |
If the number of lines of your data is large (maybe > 100), you should set haplotype = TRUE. When haplotype = TRUE, haplotype-based kernel will be used for calculating -log10(p). (So the dimension of this gram matrix will be smaller.) The result won't be changed, but the time for the calculation will be shorter. |
num.hap |
When haplotype = TRUE, you can set the number of haplotypes which you expect. Then similar arrays are considered as the same haplotype, and then make kernel(K.SNP) whose dimension is num.hap x num.hap. When num.hap = NULL (default), num.hap will be set as the maximum number which reflects the difference between lines. |
window.size.half |
This argument decides how many SNPs (around the SNP you want to test) are used to calculated K.SNP. More precisely, the number of SNPs will be 2 * window.size.half + 1. |
window.slide |
This argument determines how often you test markers. If window.slide = 1, every marker will be tested. If you want to perform SNP set by bins, please set window.slide = 2 * window.size.half + 1. |
chi0.mixture |
RAINBOWR assumes the deviance is considered to follow a x chisq(df = 0) + (1 - a) x chisq(df = r). where r is the degree of freedom. The argument chi0.mixture is a (0 <= a < 1), and default is 0.5. |
optimizer |
The function used in the optimization process. We offer "optim", "optimx", and "nlminb" functions. |
gene.set |
If you have information of gene (or haplotype block), you can use it to perform kernel-based GWAS. You should assign your gene information to gene.set in the form of a "data.frame" (whose dimension is (the number of gene) x 2). In the first column, you should assign the gene name. And in the second column, you should assign the names of each marker, which correspond to the marker names of "geno" argument. |
map.gene.set |
Genotype map for 'gene.set' (list of haplotype blocks).
This is a data.frame with the haplotype block (SNP-set, or gene-set) names in the first column.
The second and third columns contain the chromosome and map position for each block.
The forth column contains the cumulative map position for each block, which can be computed by |
plot.epi.3d |
If TRUE, draw 3d plot |
plot.epi.2d |
If TRUE, draw 2d plot |
main.epi.3d |
The title of 3d plot. If this argument is NULL, trait name is set as the title. |
main.epi.2d |
The title of 2d plot. If this argument is NULL, trait name is set as the title. |
saveName |
When drawing any plot, you can save plots in png format. In saveName, you should substitute the name you want to save. When saveName = NULL, the plot is not saved. |
skip.check |
As default, RAINBOWR checks the type of input data and modifies it into the correct format. However, it will take some time, so if you prepare the correct format of input data, you can skip this procedure by setting 'skip.check = TRUE'. |
verbose |
If this argument is TRUE, messages for the current steps will be shown. |
verbose2 |
If this argument is TRUE, welcome message will be shown. |
count |
When count is TRUE, you can know how far RGWAS has ended with percent display. |
time |
When time is TRUE, you can know how much time it took to perform RGWAS. |
Value
- $map
Map information for SNPs which are tested epistatic effects.
- $scores
-
- $scores
This is the matrix which contains -log10(p) calculated by the test about epistasis effects.
- $x, $y
The information of the positions of SNPs detected by regular GWAS. These vectors are used when drawing plots. Each output correspond to the replication of row and column of scores.
- $z
This is a vector of $scores. This vector is also used when drawing plots.
References
Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci. 100(16): 9440-9445.
Yu, J. et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 38(2): 203-208.
Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.
Endelman, J.B. (2011) Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. Plant Genome J. 4(3): 250.
Endelman, J.B. and Jannink, J.L. (2012) Shrinkage Estimation of the Realized Relationship Matrix. G3 Genes, Genomes, Genet. 2(11): 1405-1413.
Su, G. et al. (2012) Estimating Additive and Non-Additive Genetic Variances and Predicting Genetic Merits Using Genome-Wide Dense Single Nucleotide Polymorphism Markers. PLoS One. 7(9): 1-7.
Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.
Listgarten, J. et al. (2013) A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics. 29(12): 1526-1533.
Lippert, C. et al. (2014) Greater power and computational efficiency for kernel-based association testing of sets of genetic variants. Bioinformatics. 30(22): 3206-3214.
Jiang, Y. and Reif, J.C. (2015) Modeling epistasis in genomic selection. Genetics. 201(2): 759-768.
Examples
### Import RAINBOWR
require(RAINBOWR)
### Load example datasets
data("Rice_Zhao_etal")
Rice_geno_score <- Rice_Zhao_etal$genoScore
Rice_geno_map <- Rice_Zhao_etal$genoMap
Rice_pheno <- Rice_Zhao_etal$pheno
Rice_haplo_block <- Rice_Zhao_etal$haploBlock
### View each dataset
See(Rice_geno_score)
See(Rice_geno_map)
See(Rice_pheno)
See(Rice_haplo_block)
### Select one trait for example
trait.name <- "Flowering.time.at.Arkansas"
y <- as.matrix(Rice_pheno[, trait.name, drop = FALSE])
### Remove SNPs whose MAF <= 0.05
x.0 <- t(Rice_geno_score)
MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map)
x <- MAF.cut.res$x
map <- MAF.cut.res$map
### Estimate genomic relationship matrix (GRM)
K.A <- calcGRM(genoMat = x)
### Modify data
modify.data.res <- modify.data(pheno.mat = y, geno.mat = x, map = map,
return.ZETA = TRUE, return.GWAS.format = TRUE)
pheno.GWAS <- modify.data.res$pheno.GWAS
geno.GWAS <- modify.data.res$geno.GWAS
ZETA <- modify.data.res$ZETA
### View each data for RAINBOWR
See(pheno.GWAS)
See(geno.GWAS)
str(ZETA)
### Check epistatic effects (by regarding 11 SNPs as one SNP-set)
epistasis.res <- RGWAS.epistasis(pheno = pheno.GWAS, geno = geno.GWAS, ZETA = ZETA,
n.PC = 4, test.method = "LR", gene.set = NULL,
window.size.half = 5, window.slide = 11,
package.MM = "gaston", parallel.method = "mclapply",
skip.check = TRUE, n.core = 2)
See(epistasis.res$scores$scores)
### Check epistatic effects (by using the list of haplotype blocks estimated by PLINK)
### It will take almost 2 minutes...
epistasis_haplo_block.res <- RGWAS.epistasis(pheno = pheno.GWAS, geno = geno.GWAS,
ZETA = ZETA, n.PC = 4,
test.method = "LR", gene.set = Rice_haplo_block,
package.MM = "gaston", parallel.method = "mclapply",
skip.check = TRUE, n.core = 2)
See(epistasis_haplo_block.res$scores$scores)