pop.predict {PopVar} | R Documentation |
A genome-wide procedure for predicting genetic variance and correlated response in bi-parental breeding populations
Description
pop.predict
uses phenotypic and genotypic data from a set of individuals known as a training population (TP) and a set of candidate parents, which may or may not be included in the TP, to predict the mean (\mu
), genetic variance (V_G), and superior progeny values (\mu
_sp) of the half-diallel, or a defined set of pairwise bi-parental crosses between parents. When multiple traits are provided pop.predict
will also predict the correlated responses and correlation between all pairwise traits. See Mohammadi, Tiede, and Smith (2015) for further details.
NOTE - \code{pop.predict} writes and reads files to disk so it is highly recommended to set your working directory
Usage
pop.predict(
G.in = NULL,
y.in = NULL,
map.in = NULL,
crossing.table = NULL,
parents = NULL,
tail.p = 0.1,
nInd = 200,
map.plot = FALSE,
min.maf = 0.01,
mkr.cutoff = 0.5,
entry.cutoff = 0.5,
remove.dups = TRUE,
impute = "EM",
nSim = 25,
frac.train = 0.6,
nCV.iter = 100,
nFold = NULL,
nFold.reps = 1,
nIter = 12000,
burnIn = 3000,
models = c("rrBLUP", "BayesA", "BayesB", "BayesC", "BL", "BRR"),
return.raw = FALSE,
saveAt = tempdir()
)
Arguments
G.in |
TIP - Set header= |
y.in |
|
map.in |
|
crossing.table |
Optional |
parents |
Optional |
tail.p |
Optional |
nInd |
Optional |
map.plot |
Optional |
min.maf |
Optional |
mkr.cutoff |
Optional |
entry.cutoff |
Optional |
remove.dups |
Optional |
impute |
Options include |
nSim |
Optional |
frac.train |
Optional |
nCV.iter |
Optional |
nFold |
Optional |
nFold.reps |
Optional |
nIter , burnIn |
Optional |
models |
Optional |
return.raw |
Optional |
saveAt |
When using models other than "rrBLUP" (i.e. Bayesian models), this is a path and prefix for saving temporary files
the are produced by the |
Details
pop.predict
can be used to predict the mean (\mu
), genetic variance (V_G), superior progeny values (\mu
_sp
), as well as the predicted correlated response and correlations between all pairwise traits. The methodology and procedure to do so has been described in Bernardo (2014) and Mohammadi, Tiede, and K.P. Smith (2015). Users familiar with genome-wide prediction, association mapping, and/or linkage mapping will be familiar with the
required inputs of pop.predict
. G.in
includes all of the entries (taxa) in the TP as well as additional entries to be considered as parent candidates. Entries included in G.in
that do have a phenotype for any or all traits in y.in
are considered TP entries for those respective traits. G.in
is filtered according to min.maf
, mkr.cutoff
, entry.cutoff
, and remove.dups
;
remaining missing marker data is imputed using the EM algorithm (Poland et al., 2012) when possible, and the marker mean otherwise, both implemented in rrBLUP-package
. For each trait, the TP (i.e. entries with phenotype) is used to:
Perform CV to select a regression model. NOTE - Using the model with the highest CV accuracy is expected to result in the most accurate marker effect estimates (Bernardo, 2014). This expectation, however, is yet to be empirically validated and the user is encouraged to investigate the various models in order to make an educated decision about which one to ultimately use.
Estimate marker effects using the model resulting in the highest CV accuracy
Models include ridge regression BLUP implemented in rrBLUP-package
(Endelman, 2011) and BayesA, BayesB, BayesC\pi
, Bayesian lasso (BL), and Bayesian ridge regression (BRR) implemented in BGLR
(de los Compos and Rodriguez, 2014).
Information from the map.in
is then used to simulate chromosomal recombination expected in a recombinant inbred line (i.e. F-infinity) (Broman et al., 2003) population (size=nInd
). A function then converts the recombined chromosomal segments of the generic RIL population to the chromosomal segments of the population's respective parents and GEBVs of the simulated progeny are calculated.
The simulation and conversion process is repeated s times, where s = nSim
, to calculate dispersion statistics for \mu
and V_G; the remainder of the values in the predictions
output are means of the s simulations. During each iteration the correlation (r) and correlated response of each pairwise combination of traits is also calculated and their mean across n simulations is returned.
The correlated response of trait.B when predicting trait.A is the mean of trait.B for the (\mu
_sp
) of trait.A, and vice-versa; a correlated response for the bottom tail.p
and upper 1-tail.p
is returned for each trait.
A dataset \code{\link{think_barley.rda}} is provided as an example of the proper formatting of input files and also for users to become familiar with \code{pop.predict}.
Value
A list
containing:
-
predictions
Alist
of dataframes containing predictions of (\mu
), (V_G), and (\mu
_sp). When multiple traits are provided the correlated responses and correlation between all pairwise traits is also included. More specifically, for a given trait pair the correlated response of the secondary trait with both the high and low superior progeny of the primary trait is returned since the favorable values cannot be known byPopVar
. -
preds.per.sim
If return.raw isTRUE
then adataframe
containing the results of each simulation is returned. This is useful for calculating dispersion statistics for traits not provided in the standardpredictions
dataframe. -
CVs
Adataframe
of CV results for each trait/model combination specified. -
models.chosen
Amatrix
listing the statistical model chosen for each trait. -
markers.removed
Avector
of markers removed during filtering for MAF and missing data. -
entries.removed
Avector
of entries removed during filtering for missing data and duplicate entries.
References
Bernardo, R. 2014. Genomewide Selection of Parental Inbreds: Classes of Loci and Virtual Biparental Populations. Crop Sci. 55:2586-2595. Broman, K. W., H. Wu, S. Sen and G.A. Churchill. 2003. R/qtl: QTL mapping in experimental crosses. Bioinformatics 19:889-890. Endelman, J. B. 2011. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4:250-255. doi: 10.3835/plantgenome2011.08.0024 Gustavo de los Campos and Paulino Perez Rodriguez, (2014). BGLR: Bayesian Generalized Linear Regression. R package version 1.0.3. http://CRAN.R-project.org/package=BGLR Mohammadi M., T. Tiede, and K.P. Smith. 2015. PopVar: A genome-wide procedure for predicting genetic variance and correlated response in bi-parental breeding populations. Crop Sci. \emph{Accepted}. Munoz-Amatriain, M., M. J. Moscou, P. R. Bhat, J. T. Svensson, J. Bartos, P. Suchankova, H. Simkova, T. R. Endo, R. D. Fenton, S. Lonardi, A. M. Castillo, S. Chao, L. Cistue, A. Cuesta-Marcos, K. L. Forrest, M. J. Hayden, P. M. Hayes, R. D. Horsley, K. Makoto, D. Moody, K. Sato, M. P. Valles, B. B. H. Wulff, G. J. Muehlbauer, J. Dolezel, and T. J. Close. 2011 An improved consensus linkage map of barley based on flow-sorted chromosomes and single nucleotide polymorphism markers. Plant Gen. 4:238-249. Poland, J., J. Endelman, J. Dawson, J. Rutkoski, S. Wu, Y. Manes, S. Dreisigacker, J. Crossa, H. Sanches-Villeda, M. Sorrells, and J.-L. Jannink. 2012. Genomic Selection in Wheat Breeding using Genotyping-by-Sequencing. Plant Genome 5:103-113.
Examples
## Not run:
# Load data
data("think_barley")
## Ex. 1 - Predict a defined set of crosses
## This example uses CV method 1 (see Details of x.val() function)
ex1.out <- pop.predict(G.in = G.in_ex, y.in = y.in_ex,
map.in = map.in_ex, crossing.table = cross.tab_ex,
nSim=5, nCV.iter=2)
ex1.out$predictions ## Predicted parameters
ex1.out$CVs ## CV results
## Ex. 2 - Predict all pairwise crosses between a list of parents
## This example uses CV method 2 (see Details of x.val() function)
par.list <- sample(y.in_ex[,1], size = 10, replace = FALSE)
ex2.out <- pop.predict(G.in = G.in_ex, y.in = y.in_ex,
map.in = map.in_ex, parents = par.list,
nSim=5, nFold=5, nFold.reps=2)
## Ex. 3 - Use only rrBLUP and Bayesian lasso (BL) models
ex3.out <- pop.predict(G.in = G.in_ex, y.in = y.in_ex,
map.in = map.in_ex, crossing.table = cross.tab_ex,
models = c("rrBLUP", "BL"), nSim=5, nCV.iter=10)
## Ex. 4 - Same as Ex. 3, but return all raw SNP and prediction data for each simulated population
ex4.out <- pop.predict(G.in = G.in_ex, y.in = y.in_ex,
map.in = map.in_ex, crossing.table = cross.tab_ex,
models = c("rrBLUP", "BL"), nSim=5, nCV.iter=2, return.raw = TRUE)
## End(Not run)