| x.val {PopVar} | R Documentation |
Estimate genome-wide prediction accuracy using cross-validation
Description
x.val performs cross-validation (CV) to estimate the accuracy of genome-wide prediction (otherwise known as genomic selection) for a specific training population (TP), i.e. a set of individuals for which phenotypic and genotypic data is available. Cross-validation can be conducted via one of two methods within x.val, see Details for more information.
NOTE - \code{x.val}, specifically \code{\link[BGLR]{BGLR}} writes and reads files to disk so it is highly recommended to set your working directory
Usage
x.val(
G.in = NULL,
y.in = NULL,
min.maf = 0.01,
mkr.cutoff = 0.5,
entry.cutoff = 0.5,
remove.dups = TRUE,
impute = "EM",
frac.train = 0.6,
nCV.iter = 100,
nFold = NULL,
nFold.reps = 1,
return.estimates = FALSE,
CV.burnIn = 750,
CV.nIter = 1500,
models = c("rrBLUP", "BayesA", "BayesB", "BayesC", "BL", "BRR"),
saveAt = tempdir()
)
Arguments
G.in |
TIP - Set header= |
y.in |
|
min.maf |
Optional |
mkr.cutoff |
Optional |
entry.cutoff |
Optional |
remove.dups |
Optional |
impute |
Options include |
frac.train |
Optional |
nCV.iter |
Optional |
nFold |
Optional |
nFold.reps |
Optional |
return.estimates |
Optional |
CV.burnIn |
Optional |
CV.nIter |
Optional |
models |
Optional |
saveAt |
When using models other than "rrBLUP" (i.e. Bayesian models), this is a path and prefix for saving temporary files
the are produced by the |
Details
Two CV methods are available within PopVar:
-
CV method 1: During each iteration a training (i.e. model training) set will be randomly sampled from the TP of sizeN*(frac.train), where N is the size of the TP, and the remainder of the TP is assigned to the validation set. The accuracies of individual models are expressed as average Pearson's correlation coefficient (r) between the genome estimated breeding value (GEBV) and observed phenotypic values in the validation set across allnCV.iteriterations. Due to its amendibility to various TP sizes, CV method 1 is the default CV method inpop.predict. -
CV method 2:nFoldindependent validation sets are sampled from the TP and predicted by the remainder. For example, ifnFold = 10the TP will be split into 10 equal sets, each containing1/10-th of the TP, which will be predicted by the remaining9/10-ths of the TP. The accuracies of individual models are expressed as the average (r) between the GEBV and observed phenotypic values in the validation set across allnFoldfolds. The process can be repeatednFold.repstimes withnFoldnew independent sets being sampled each replication, in which case the reported prediction accuracies are averages across all folds and replications.
Value
A list containing:
-
CVsAdataframeof CV results for each trait/model combination specified If
return.estimatesisTRUEthe additional items will be returned:-
models.usedAlistof the models chosen to estimate marker effects for each trait -
mkr.effectsAvectorof marker effect estimates for each trait generated by the respective prediction model used -
betasAlistof beta values for each trait generated by the respective prediction model used
-
Examples
## CV using method 1 with 25 iterations
CV.mthd1 <- x.val(G.in = G.in_ex, y.in = y.in_ex, nCV.iter = 25)
CV.mthd1$CVs
## CV using method 2 with 5 folds and 3 replications
x.val(G.in = G.in_ex, y.in = y.in_ex, nFold = 5, nFold.reps = 3)