kin.blup {rrBLUP} | R Documentation |
Genotypic value prediction based on kinship
Description
Genotypic value prediction by G-BLUP, where the genotypic covariance G can be additive or based on a Gaussian kernel.
Usage
kin.blup(data,geno,pheno,GAUSS=FALSE,K=NULL,fixed=NULL,covariate=NULL,
PEV=FALSE,n.core=1,theta.seq=NULL)
Arguments
data |
Data frame with columns for the phenotype, the genotype identifier, and any environmental variables. |
geno |
Character string for the name of the column in the data frame that contains the genotype identifier. |
pheno |
Character string for the name of the column in the data frame that contains the phenotype. |
GAUSS |
To model genetic covariance with a Gaussian kernel, set GAUSS=TRUE and pass the Euclidean distance for K (see below). |
K |
There are three options for specifying kinship:
(1) If K=NULL, genotypes are assumed to be independent |
fixed |
An array of strings containing the names of columns that should be included as (categorical) fixed effects in the mixed model. |
covariate |
An array of strings containing the names of columns that should be included as covariates in the mixed model. |
PEV |
When PEV=TRUE, the function returns the prediction error variance for the genotypic values ( |
n.core |
Specifies the number of cores to use for parallel execution of the Gaussian kernel method (use only at UNIX command line). |
theta.seq |
The scale parameter for the Gaussian kernel is set by maximizing the restricted log-likelihood over a grid of values. By default, the grid is constructed by dividing the interval (0,max(K)] into 10 points. Passing a numeric array to this variable (theta.seq = "theta sequence") will specify a different set of grid points (e.g., for large problems you might want fewer than 10). |
Details
This function is a wrapper for mixed.solve
and thus solves mixed models of the form:
y = X \beta + [Z \: 0] g + \varepsilon
where \beta
is a vector of fixed effects, g
is a vector of random genotypic values with covariance
G = Var[g]
, and the residuals follow Var[\varepsilon_i] = R_i \sigma^2_e
, with R_i = 1
by default. The design matrix for the genetic values has been partitioned to illustrate that not all lines need phenotypes (i.e., for genomic selection). Unlike mixed.solve
, this function does not return estimates of the fixed effects, only the BLUP solution for the genotypic values. It was designed to replace kinship.BLUP
and to relieve the user of having to explicitly construct design matrices. Variance components are estimated by REML and BLUP values are returned for every entry in K, regardless of whether it has been phenotyped. The rownames of K must match the genotype labels in the data frame for phenotyped lines; missing phenotypes (NA) are simply omitted.
Unlike its predecessor, this function does not handle marker data directly. For breeding value prediction, the user must supply a relationship matrix, which can be calculated from markers with A.mat
. For Gaussian kernel predictions, pass the Euclidean distance matrix for K, which can be calculated with dist
.
In the terminology of mixed models, both the "fixed" and "covariate" variables are fixed effects (\beta
in the above equation): the former are treated as factors with distinct levels while the latter are continuous with one coefficient per variable. The population mean is automatically included as a fixed effect.
The prediction error variance (PEV) is the square of the SE of the BLUPs (see mixed.solve
) and can be used to estimate the expected accuracy of BLUP predictions according to r^2_i = 1 - \frac{PEV_i}{V_g K_{ii}}
.
Value
The function always returns
- $Vg
REML estimate of the genetic variance
- $Ve
REML estimate of the error variance
- $g
BLUP solution for the genetic values
- $resid
residuals
- $pred
predicted genetic values, averaged over the fixed effects
If PEV = TRUE, the list also includes
- $PEV
Prediction error variance for the genetic values
If GAUSS = TRUE, the list also includes
- $profile
the log-likelihood profile for the scale parameter in the Gaussian kernel
References
Endelman, J.B. 2011. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4:250-255. <doi:10.3835/plantgenome2011.08.0024>
Examples
#random population of 200 lines with 1000 markers
M <- matrix(rep(0,200*1000),200,1000)
for (i in 1:200) {
M[i,] <- ifelse(runif(1000)<0.5,-1,1)
}
rownames(M) <- 1:200
A <- A.mat(M)
#random phenotypes
u <- rnorm(1000)
g <- as.vector(crossprod(t(M),u))
h2 <- 0.5 #heritability
y <- g + rnorm(200,mean=0,sd=sqrt((1-h2)/h2*var(g)))
data <- data.frame(y=y,gid=1:200)
#predict breeding values
ans <- kin.blup(data=data,geno="gid",pheno="y",K=A)
accuracy <- cor(g,ans$g)