x.val {PopVar} | R Documentation |
Estimate genome-wide prediction accuracy using cross-validation
Description
x.val
performs cross-validation (CV) to estimate the accuracy of genome-wide prediction (otherwise known as genomic selection) for a specific training population (TP), i.e. a set of individuals for which phenotypic and genotypic data is available. Cross-validation can be conducted via one of two methods within x.val
, see Details
for more information.
NOTE - \code{x.val}, specifically \code{\link[BGLR]{BGLR}} writes and reads files to disk so it is highly recommended to set your working directory
Usage
x.val(
G.in = NULL,
y.in = NULL,
min.maf = 0.01,
mkr.cutoff = 0.5,
entry.cutoff = 0.5,
remove.dups = TRUE,
impute = "EM",
frac.train = 0.6,
nCV.iter = 100,
nFold = NULL,
nFold.reps = 1,
return.estimates = FALSE,
CV.burnIn = 750,
CV.nIter = 1500,
models = c("rrBLUP", "BayesA", "BayesB", "BayesC", "BL", "BRR"),
saveAt = tempdir()
)
Arguments
G.in |
TIP - Set header= |
y.in |
|
min.maf |
Optional |
mkr.cutoff |
Optional |
entry.cutoff |
Optional |
remove.dups |
Optional |
impute |
Options include |
frac.train |
Optional |
nCV.iter |
Optional |
nFold |
Optional |
nFold.reps |
Optional |
return.estimates |
Optional |
CV.burnIn |
Optional |
CV.nIter |
Optional |
models |
Optional |
saveAt |
When using models other than "rrBLUP" (i.e. Bayesian models), this is a path and prefix for saving temporary files
the are produced by the |
Details
Two CV methods are available within PopVar
:
-
CV method 1
: During each iteration a training (i.e. model training) set will be randomly sampled from the TP of sizeN*(frac.train)
, where N is the size of the TP, and the remainder of the TP is assigned to the validation set. The accuracies of individual models are expressed as average Pearson's correlation coefficient (r) between the genome estimated breeding value (GEBV) and observed phenotypic values in the validation set across allnCV.iter
iterations. Due to its amendibility to various TP sizes, CV method 1 is the default CV method inpop.predict
. -
CV method 2
:nFold
independent validation sets are sampled from the TP and predicted by the remainder. For example, ifnFold = 10
the TP will be split into 10 equal sets, each containing1/10
-th of the TP, which will be predicted by the remaining9/10
-ths of the TP. The accuracies of individual models are expressed as the average (r) between the GEBV and observed phenotypic values in the validation set across allnFold
folds. The process can be repeatednFold.reps
times withnFold
new independent sets being sampled each replication, in which case the reported prediction accuracies are averages across all folds and replications.
Value
A list containing:
-
CVs
Adataframe
of CV results for each trait/model combination specified If
return.estimates
isTRUE
the additional items will be returned:-
models.used
Alist
of the models chosen to estimate marker effects for each trait -
mkr.effects
Avector
of marker effect estimates for each trait generated by the respective prediction model used -
betas
Alist
of beta values for each trait generated by the respective prediction model used
-
Examples
## CV using method 1 with 25 iterations
CV.mthd1 <- x.val(G.in = G.in_ex, y.in = y.in_ex, nCV.iter = 25)
CV.mthd1$CVs
## CV using method 2 with 5 folds and 3 replications
x.val(G.in = G.in_ex, y.in = y.in_ex, nFold = 5, nFold.reps = 3)