bwgs.predict {BWGS}  R Documentation 
Computes the GEBV prediction for the target population with only genotypic Data using the options for model selection.
bwgs.predict( geno_train, pheno_train, geno_target, FIXED_train = "NULL", FIXED_target = "NULL", MAXNA = 0.2, MAF = 0.05, geno.reduct.method = "NULL", reduct.size = "NULL", r2 = "NULL", pval = "NULL", MAP = NULL, geno.impute.method = "NULL", predict.method = "GBLUP" )
geno_train 
Matrix (n x m) of genotypes for the training population: n lines with m markers. Genotypes should be coded as 1, 0, 1, NA. Missing data are allowed and coded as NA. 
pheno_train 
Vector (n x 1) of phenotype for the training phenotypes. This vector should have no missing values. Otherwise, missing values (NA) will be omitted in both pheno_train and geno_train. 
geno_target 
Matrix (z x m) of genotypes for the target population: z lines with the same m markers as in geno_train. Genotypes should be coded as 1, 0, 1, NA. Missing data are allowed and coded as NA. Other arguments are identical to those of bwgs.cv, except pop_reduct_method, nTimes and nFolds, since the prediction is run only once, using the whole training population for model estimation, then applied to the target population. 
FIXED_train 
A matrix of fixed effect for training, to be used with some methods such as those included in BGLR, MUST have same rownames as geno and coded(1 0 1) 
FIXED_target 
A matrix of fixed effect for targeting, to be used with some methods such as those included in BGLR, MUST have same rownames as geno and coded(1 0 1) 
MAXNA 
The maximum proportion of missing value which is admitted for filtering marker columns in geno. Default value is 0.2 
MAF 
The minimum allele frequency for filtering marker colums in geno; default value is 0.05 
geno.reduct.method 
Allows sampling a subset of markers for speeding up computing time and/or avoid introducing more noise than informative markers. Options are:

reduct.size 
Specifies the number of markers for the genotypic reduction using RMR (reduct.size < m). 
r2 
Coefficient of linkage disequilibrium (LD). Setting 0<r2<1 if the genotypic reduction method is in LD or ANO+LD . 
pval 
p value for ANO method, 0 < pval < 1. 
MAP 
A file with markers in rows ane at least ONE columns with colnames= "chrom". Used for computing r2 within linkage groups. 
geno.impute.method 
Allow missing marker data imputation using the two methods proposed in function A.mat of package rrBLUP, namely:
Default value is NULL. Note that these imputation methods are only suited when there are a few missing value, typically in marker data from SNP chips of KasPAR. They are NOT suited for imputing marker data from low density to high density designs, and when there are MANY missing Data as typically provided by GBS. More sophisticated software (e.g. Beagles, Browning & Browning 2016) should be used before BWGS. 
predict.method 
The options for genomic breeding value prediction methods. The available options are:
Several Bayesian methods, using the BGLR library:
A more detailed description of these methods can be found in Perez & de los Campos 2014 (http://genomics.cimmyt.org/BGLRextdoc.pdf). Three semiparametric methods:

The object bwgs.predict returns Matrix of dimension nx3. Columns are:
Predict BV: the nx1 vector of GEBVs for the validation set (rows of geno_valid)
gpredSD: Standart deviation of estimated GEBV
CD: coefficient of determination for each GEBV, estimated as sqrt ((1stdev(GEBVi))^2/2g)
Note that gpredSD and CD are only available for methods using the BGLR library, namely GBLUP, EGBLUP, BA,BB,BC,BL,RKHS and MKRKHS. These two columns contain NA for methods RF, RR, LASSO, EN and SVM.
data(inra) # Prediction using GBLUP method predict_gblup < bwgs.predict(geno_train = TRAIN47K, pheno_train = YieldBLUE, geno_target = TARGET47K, MAXNA = 0.2, MAF = 0.05, geno.reduct.method = "NULL", reduct.size = "NULL", r2 = "NULL", pval = "NULL", MAP = "NULL", geno.impute.method = "MNI", predict.method = "GBLUP")