R: Bayes model

ibrm {hibayes}

R Documentation

Bayes model

Description

Bayes linear regression model using individual level data

y = X \beta + R r + M \alpha + e

where \beta is a vector of estimated coefficient for covariates, and r is a vector of environmental random effects. M is a matrix of genotype covariate, \alpha is a vector of estimated marker effect size. e is a vector of residuals.

Usage

ibrm(
  formula,
  data = NULL,
  M = NULL,
  M.id = NULL,
  method = c("BayesCpi", "BayesA", "BayesL", "BSLMM", "BayesR", "BayesB", "BayesC",
    "BayesBpi", "BayesRR"),
  map = NULL,
  Pi = NULL,
  fold = NULL,
  niter = NULL,
  nburn = NULL,
  thin = 5,
  windsize = NULL,
  windnum = NULL,
  dfvr = NULL,
  s2vr = NULL,
  vg = NULL,
  dfvg = NULL,
  s2vg = NULL,
  ve = NULL,
  dfve = NULL,
  s2ve = NULL,
  lambda = 0,
  printfreq = 100,
  seed = 666666,
  threads = 4,
  verbose = TRUE
)

Arguments

`formula`	a two-sided linear formula object describing both the fixed-effects and random-effects part of the model, with the response on the left of a ‘~’ operator and the terms, separated by ‘+’ operators, on the right. Random-effects terms are distinguished by vertical bars (1\|’) separating expressions for design matrices from grouping factors.
`data`	the data frame containing the variables named in 'formula', NOTE that the first column in 'data' should be the individual id.
`M`	numeric matrix of genotype with individuals in rows and markers in columns, NAs are not allowed.
`M.id`	vector of id for genotyped individuals, NOTE that no need to adjust the order of id to be the same between 'data' and 'M', the package will do it automatically.
`method`	bayes methods including: "BayesB", "BayesA", "BayesL", "BayesRR", "BayesBpi", "BayesC", "BayesCpi", "BayesR", "BSLMM". "BayesRR": Bayes Ridge Regression, all SNPs have non-zero effects and share the same variance, equals to RRBLUP or GBLUP. "BayesA": all SNPs have non-zero effects, and take different variance which follows an inverse chi-square distribution. "BayesB": only a small proportion of SNPs (1-Pi) have non-zero effects, and take different variance which follows an inverse chi-square distribution. "BayesBpi": the same with "BayesB", but 'Pi' is not fixed. "BayesC": only a small proportion of SNPs (1-Pi) have non-zero effects, and share the same variance. "BayesCpi": the same with "BayesC", but 'Pi' is not fixed. "BayesL": BayesLASSO, all SNPs have non-zero effects, and take different variance which follows an exponential distribution. "BSLMM": all SNPs have non-zero effects, and take the same variance, but a small proportion of SNPs have additional shared variance. "BayesR": only a small proportion of SNPs have non-zero effects, and the SNPs are allocated into different groups, each group has the same variance.
`map`	(optional, only for GWAS) the map information of genotype, at least 3 columns are: SNPs, chromosome, physical position.
`Pi`	vector, the proportion of zero effect and non-zero effect SNPs, the first value must be the proportion of non-effect markers.
`fold`	proportion of variance explained for groups of SNPs, the default is c(0, 0.0001, 0.001, 0.01).
`niter`	the number of MCMC iteration.
`nburn`	the number of iterations to be discarded.
`thin`	the number of thinning after burn-in. Note that smaller thinning frequency may have higher accuracy of estimated parameters, but would result in more memory for collecting process, on contrary, bigger frequency may have negative effect on accuracy of estimations.
`windsize`	window size in bp for GWAS, the default is NULL.
`windnum`	fixed number of SNPs in a window for GWAS, if it is specified, 'windsize' will be invalid, the default is NULL.
`dfvr`	the number of degrees of freedom for the distribution of environmental variance.
`s2vr`	scale parameter for the distribution of environmental variance.
`vg`	prior value of genetic variance.
`dfvg`	the number of degrees of freedom for the distribution of genetic variance.
`s2vg`	scale parameter for the distribution of genetic variance.
`ve`	prior value of residual variance.
`dfve`	the number of degrees of freedom for the distribution of residual variance.
`s2ve`	scale parameter for the distribution of residual variance.
`lambda`	value of ridge regression for inverting a matrix.
`printfreq`	frequency of printing iterative details on console.
`seed`	seed for random sample.
`threads`	number of threads used for OpenMP.
`verbose`	whether to print the iteration information on console.

Details

the fixed effects and covariates in 'formula' must be in factors and numeric, respectively. if not, please remember to use 'as.factor' and 'as.numeric' to transform.
the package has the automatical function of taking the intersection and adjusting the order of id between 'data' and the genotype 'M', thus the first column in 'data' should be the individual id.
if any one of the options 'windsize' and 'windnum' is specified, the GWAS results will be returned, and the 'map' information must be provided, in which the physical positions should be all in digital values.
the 'windsize' or 'windnum' option only works for the methods of which the assumption has a proportion of zero effect markers, e.g., BayesB, BayesBpi, BayesC, BayesCpi, BSLMM, and BayesR.

Value

the function returns a 'blrMod' object containing

$mu

the regression intercept

$pi

estimated proportion of zero effect and non-zero effect SNPs

$beta

estimated coefficients for all covariates

$r

estimated environmental random effects

$Vr

estimated variance for all environmental random effect

$Vg

estimated genetic variance

$Ve

estimated residual variance

$h2

estimated heritability (h2 = Vg / (Vr + Vg + Ve))

$alpha

estimated effect size of all markers

$g

genomic estimated breeding value

$e

residuals of the model

$pip

the frequency for markers to be included in the model during MCMC iteration, known as posterior inclusive probability (PIP)

$gwas

WPPA is defined to be the window posterior probability of association, it is estimated by counting the number of MCMC samples in which

\alpha

is nonzero for at least one SNP in the window

$MCMCsamples

the collected samples of posterior estimation for all the above parameters across MCMC iterations

References

Meuwissen, Theo HE, Ben J. Hayes, and Michael E. Goddard. "Prediction of total genetic value using genome-wide dense marker maps." Genetics 157.4 (2001): 1819-1829.
de los Campos, G., Hickey, J. M., Pong-Wong, R., Daetwyler, H. D., and Calus, M. P. (2013). Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics, 193(2), 327-345.
Habier, David, et al. "Extension of the Bayesian alphabet for genomic selection." BMC bioinformatics 12.1 (2011): 1-12.
Yi, Nengjun, and Shizhong Xu. "Bayesian LASSO for quantitative trait loci mapping." Genetics 179.2 (2008): 1045-1055.
Zhou, Xiang, Peter Carbonetto, and Matthew Stephens. "Polygenic modeling with Bayesian sparse linear mixed models." PLoS genetics 9.2 (2013): e1003264.
Moser, Gerhard, et al. "Simultaneous discovery, estimation and prediction analysis of complex traits using a Bayesian mixture model." PLoS genetics 11.4 (2015): e1004969.

Examples

# Load the example data attached in the package
pheno_file_path = system.file("extdata", "demo.phe", package = "hibayes")
pheno = read.table(pheno_file_path, header=TRUE)

bfile_path = system.file("extdata", "demo", package = "hibayes")
bin = read_plink(bfile_path, threads=1)
fam = bin$fam
geno = bin$geno
map = bin$map

# For GS/GP
## no environmental effects:
fit = ibrm(T1~1, data=pheno, M=geno, M.id=fam[,2], method="BayesCpi",
	niter=2000, nburn=1200, thin=5, threads=1)

## overview of the returned results
summary(fit)



## add fixed effects or covariates:
fit = ibrm(T1~sex+season+day+bwt, data=pheno, M=geno, M.id=fam[,2],
	method="BayesCpi")
 
## add environmental random effects:
fit = ibrm(T1~sex+(1|loc)+(1|dam), data=pheno, M=geno, M.id=fam[,2],
	method="BayesCpi")

# For GWAS
fit = ibrm(T1~sex+bwt+(1|dam), data=pheno, M=geno, M.id=fam[,2],
	method="BayesCpi", map=map, windsize=1e6)


# get the SD of estimated SNP effects for markers
summary(fit)$alpha
# get the SD of estimated breeding values
summary(fit)$g

[Package hibayes version 3.0.3 Index]