R: Genotypic value prediction based on kinship

kin.blup {rrBLUP}

R Documentation

Genotypic value prediction based on kinship

Description

Genotypic value prediction by G-BLUP, where the genotypic covariance G can be additive or based on a Gaussian kernel.

Usage

kin.blup(data,geno,pheno,GAUSS=FALSE,K=NULL,fixed=NULL,covariate=NULL,
         PEV=FALSE,n.core=1,theta.seq=NULL)

Arguments

`data`	Data frame with columns for the phenotype, the genotype identifier, and any environmental variables.
`geno`	Character string for the name of the column in the data frame that contains the genotype identifier.
`pheno`	Character string for the name of the column in the data frame that contains the phenotype.
`GAUSS`	To model genetic covariance with a Gaussian kernel, set GAUSS=TRUE and pass the Euclidean distance for K (see below).
`K`	There are three options for specifying kinship: (1) If K=NULL, genotypes are assumed to be independent `(G=I \: V_g)`. (2) For breeding value prediction, set GAUSS=FALSE and use an additive relationship matrix for K to create the model `(G=K \: V_g)`. (3) For the Gaussian kernel, set GAUSS=TRUE and pass the Euclidean distance matrix for K to create the model `G_{ij}=e^{-(K_{ij}/\theta)^2} \: V_g`.
`fixed`	An array of strings containing the names of columns that should be included as (categorical) fixed effects in the mixed model.
`covariate`	An array of strings containing the names of columns that should be included as covariates in the mixed model.
`PEV`	When PEV=TRUE, the function returns the prediction error variance for the genotypic values (`PEV_i = Var[g^*_i-g_i]`).
`n.core`	Specifies the number of cores to use for parallel execution of the Gaussian kernel method (use only at UNIX command line).
`theta.seq`	The scale parameter for the Gaussian kernel is set by maximizing the restricted log-likelihood over a grid of values. By default, the grid is constructed by dividing the interval (0,max(K)] into 10 points. Passing a numeric array to this variable (theta.seq = "theta sequence") will specify a different set of grid points (e.g., for large problems you might want fewer than 10).

Details

This function is a wrapper for mixed.solve and thus solves mixed models of the form:

y = X \beta + [Z \: 0] g + \varepsilon

where \beta is a vector of fixed effects, g is a vector of random genotypic values with covariance G = Var[g], and the residuals follow Var[\varepsilon_i] = R_i \sigma^2_e, with R_i = 1 by default. The design matrix for the genetic values has been partitioned to illustrate that not all lines need phenotypes (i.e., for genomic selection). Unlike mixed.solve, this function does not return estimates of the fixed effects, only the BLUP solution for the genotypic values. It was designed to replace kinship.BLUP and to relieve the user of having to explicitly construct design matrices. Variance components are estimated by REML and BLUP values are returned for every entry in K, regardless of whether it has been phenotyped. The rownames of K must match the genotype labels in the data frame for phenotyped lines; missing phenotypes (NA) are simply omitted.

Unlike its predecessor, this function does not handle marker data directly. For breeding value prediction, the user must supply a relationship matrix, which can be calculated from markers with A.mat. For Gaussian kernel predictions, pass the Euclidean distance matrix for K, which can be calculated with dist.

In the terminology of mixed models, both the "fixed" and "covariate" variables are fixed effects (\beta in the above equation): the former are treated as factors with distinct levels while the latter are continuous with one coefficient per variable. The population mean is automatically included as a fixed effect.

The prediction error variance (PEV) is the square of the SE of the BLUPs (see mixed.solve) and can be used to estimate the expected accuracy of BLUP predictions according to r^2_i = 1 - \frac{PEV_i}{V_g K_{ii}}.

Value

The function always returns

$Vg: REML estimate of the genetic variance
$Ve: REML estimate of the error variance
$g: BLUP solution for the genetic values
$resid: residuals
$pred: predicted genetic values, averaged over the fixed effects

If PEV = TRUE, the list also includes

$PEV: Prediction error variance for the genetic values

If GAUSS = TRUE, the list also includes

$profile: the log-likelihood profile for the scale parameter in the Gaussian kernel

References

Endelman, J.B. 2011. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4:250-255. <doi:10.3835/plantgenome2011.08.0024>

Examples

#random population of 200 lines with 1000 markers
M <- matrix(rep(0,200*1000),200,1000)
for (i in 1:200) {
  M[i,] <- ifelse(runif(1000)<0.5,-1,1)
}
rownames(M) <- 1:200
A <- A.mat(M)

#random phenotypes
u <- rnorm(1000)
g <- as.vector(crossprod(t(M),u))
h2 <- 0.5  #heritability
y <- g + rnorm(200,mean=0,sd=sqrt((1-h2)/h2*var(g)))

data <- data.frame(y=y,gid=1:200)

#predict breeding values
ans <- kin.blup(data=data,geno="gid",pheno="y",K=A)
accuracy <- cor(g,ans$g)

[Package rrBLUP version 4.6.3 Index]