R: Convert BLUP of individuals into BLUP of variants

ghap.varblup {GHap}

R Documentation

Convert BLUP of individuals into BLUP of variants

Description

Given genomic estimated breeding values (GEBVs), compute Best Linear Unbiased Predictor (BLUP) solutions for variant effects.

Usage

ghap.varblup(object, gebv, covmat, type = 1,
             only.active.variants = TRUE,
             weights = NULL, tol = 1e-12,
             vcp = NULL, errormat = NULL, 
             errorname = "", nlambda = 1000,
             ncores = 1, verbose = TRUE)

Arguments

`object`	A valid GHap object (phase, haplo or plink).
`gebv`	A named vector of genomic estimated breeding values.
`covmat`	An additive genomic relationship matrix, such as obtained with type=1 or type=2 in the `ghap.kinship` function.
`type`	A numeric value indicating the type of relationship matrix (see details in the `ghap.kinship` function).
`only.active.variants`	A logical value specifying whether only active variants should be included in the calculations (default = TRUE).
`weights`	A numeric vector providing variant-specific weights.
`tol`	A numeric value specifying the scalar to add to the diagonal of the relationship matrix it is not inversible (default = 1e-10).
`vcp`	A numeric value for the variance in GEBVs.
`errormat`	A square error matrix for GEBVs. This matrix can be obtained with argument extras = "LHSi" in the `ghap.lmm` function. If provided, calculation of standard errors and test statistics for the variants is activated.
`errorname`	The name used for the random effect representing GEBVs in the `ghap.lmm` function. If the error matrix was imported from somewhere else, this argument can be ignored provided that the names in the error matrix match the ones in the relationship matrix.
`nlambda`	A numeric value for the number of variants to be used in the estimation of the inflation factor (default = 1000).
`ncores`	A numeric value specifying the number of cores to be used in parallel computations (default = 1).
`verbose`	A logical value specifying whether log messages should be printed (default = TRUE).

Details

The function uses the equation:

\mathbf{\hat{a}} = q\mathbf{DM}^T\mathbf{K}^{-1}\mathbf{\hat{u}}

where \mathbf{M} is the n x m matrix of genotypes, where n is the number of individuals and m is the number of variants (i.e, markers or HapAlleles), \mathbf{D} = diag(d_i) with d_i being the weight of variant i (default d_i = 1), q is the inverse weighted sum of variances in the columns of \mathbf{M}, \mathbf{K} is the additive genomic relationship matrix and \hat{u} is the vector of GEBVs.

Value

The function returns a data frame with results from the genome-wide conversion of BLUP of individuals into BLUP of variants. If a GHap.haplo object is used, the first columns of the data frame will be:

`CHR`	Chromosome name.
`BLOCK`	Block alias.
`BP1`	Block start position.
`BP2`	Block end position.

For GHap.phase and GHap.plink objects, the first columns will be:

`CHR`	Chromosome name.
`MARKER`	Block start position.
`BP`	Block end position.

The remaining columns of the data frame will be equal for any class of GHap objects:

`ALLELE`	Identity of the counted (A1 or haplotype) allele.
`FREQ`	Frequency of the allele.
`SCORE`	Estimated BLUP of the allele.
`VAR`	Variance in allele-specific breeding values.
`pVAR`	Proportion of variance explained by the allele.
`CENTER`	Average genotype (meaningful only for predictions with `ghap.profile`).
`SCALE`	A constant set to 1 (meaningful only for predictions with `ghap.profile`).

If an error matrix for GEBVs is provided through the 'errormat' argument, the following additional columns are included in the data frame:

`SE`	Standard error for the BLUP of the allele.
`CHISQ.EXP`	Expected values for the test statistics.
`CHISQ.OBS`	Observed value for the test statistics.
`CHISQ.GC`	Test statistics scaled by the inflation factor (Genomic Control). Inflation is computed through regression of observed quantiles onto expected quantiles. In order to avoid overestimation by variants rejecting the null hypothesis, a random sample of variants (with size controled via the nlambda argument) is taken within three standard deviations from the mean of the distribution of test statistics.
`LOGP`	log10(1/P) or -log10(P) for the BLUP of the allele.
`LOGP.GC`	log10(1/P) or -log10(P) for the BLUP of the allele (scaled by the inflation factor).

Author(s)

Yuri Tani Utsunomiya <ytutsunomiya@gmail.com>

References

I. Stranden and D.J. Garrick. Technical note: derivation of equivalent computing algorithms for genomic predictions and reliabilities of animal merit. J Dairy Sci. 2009. 92:2971-2975.

J.L.G. Duarte et al. Rapid screening for phenotype-genotype associations by linear transformations of genomic evaluations. BMC Bioinformatics. 2014, 15:246.

Examples


# #### DO NOT RUN IF NOT NECESSARY ###
# 
# # Copy plink data in the current working directory
# exfiles <- ghap.makefile(dataset = "example",
#                          format = "plink",
#                          verbose = TRUE)
# file.copy(from = exfiles, to = "./")
# 
# # Copy metadata in the current working directory
# exfiles <- ghap.makefile(dataset = "example",
#                          format = "meta",
#                          verbose = TRUE)
# file.copy(from = exfiles, to = "./")
# 
# # Load plink data
# plink <- ghap.loadplink("example")
# 
# # Load phenotype and pedigree data
# df <- read.table(file = "example.phenotypes", header=T)
# 
# ### RUN ###
# 
# # Subset individuals from the pure1 population
# pure1 <- plink$id[which(plink$pop == "Pure1")]
# plink <- ghap.subset(object = plink, ids = pure1, variants = plink$marker)
# 
# # Subset markers with MAF > 0.05
# freq <- ghap.freq(plink)
# mkr <- names(freq)[which(freq > 0.05)]
# plink <- ghap.subset(object = plink, ids = pure1, variants = mkr)
# 
# # Compute genomic relationship matrix
# # Induce sparsity to help with matrix inversion
# K <- ghap.kinship(plink, sparsity = 0.01)
# 
# # Fit mixed model
# df$rep <- df$id
# model <- ghap.lmm(formula = pheno ~ 1 + (1|id) + (1|rep),
#                   data = df,
#                   covmat = list(id = K, rep = NULL),
#                   extras = "LHSi")
# refblup <- model$random$id$Estimate
# names(refblup) <- rownames(model$random$id)
# 
# # Convert blup of individuals into blup of variants
# mkrblup <- ghap.varblup(object = plink, gebv = refblup,
#                         covmat = K, vcp = model$vcp$Estimate[1],
#                         errormat = model$extras$LHSi, errorname = "id")
# 
# # Build GEBVs from variant effects and compare predictions
# gebv <- ghap.profile(object = plink, score = mkrblup)
# plot(gebv$SCORE, refblup); abline(0,1)
# 
# # Compare variant solutions with regular GWAS
# gwas <- ghap.assoc(object = plink,
#                    formula = pheno ~ 1 + (1|id) + (1|rep),
#                    data = df,
#                    covmat = list(id = K, rep = NULL))
# ghap.manhattan(data = gwas, chr = "CHR", bp = "BP", y = "LOGP")
# ghap.manhattan(data = mkrblup, chr = "CHR", bp = "BP", y = "LOGP")
# plot(mkrblup$LOGP, gwas$LOGP); abline(0,1)

[Package GHap version 3.0.0 Index]