ssbrm {hibayes} | R Documentation |
Single-step Bayes model
Description
Single-step Bayes linear regression model using individual level data and pedigree information
y = X \beta + R r + M \alpha + U \epsilon + e
where y
is the vector of phenotypic values for both genotyped and non-genotyped individuals, \beta
is a vector of estimated coefficient for covariates, M
contains the genotype (M_2
) for genotyped individuals and the imputed genotype (M_1 = A_{12}A_{22}^{-1}M_2
) for non-genotyped individuals, \epsilon
is the vector of genotype imputation error, e
is a vector of residuals.
Usage
ssbrm(
formula,
data = NULL,
M = NULL,
M.id = NULL,
pedigree = NULL,
method = c("BayesCpi", "BayesA", "BayesL", "BayesR", "BayesB", "BayesC", "BayesBpi",
"BayesRR"),
map = NULL,
Pi = NULL,
fold = NULL,
niter = NULL,
nburn = NULL,
thin = 5,
windsize = NULL,
windnum = NULL,
maf = 0.01,
dfvr = NULL,
s2vr = NULL,
vg = NULL,
dfvg = NULL,
s2vg = NULL,
ve = NULL,
dfve = NULL,
s2ve = NULL,
printfreq = 100,
seed = 666666,
threads = 4,
verbose = TRUE
)
Arguments
formula |
a two-sided linear formula object describing both the fixed-effects and random-effects part of the model, with the response on the left of a ‘~’ operator and the terms, separated by ‘+’ operators, on the right. Random-effects terms are distinguished by vertical bars (1|’) separating expressions for design matrices from grouping factors. |
data |
the data frame containing the variables named in 'formula', NOTE that the first column in 'data' should be the individual id. |
M |
numeric matrix of genotype with individuals in rows and markers in columns, NAs are not allowed. |
M.id |
vector of id for genotype. |
pedigree |
matrix of pedigree, 3 columns limited, the order of columns shoud be "id", "sir", "dam". |
method |
bayes methods including: "BayesB", "BayesA", "BayesL", "BayesRR", "BayesBpi", "BayesC", "BayesCpi", "BayesR".
|
map |
(optional, only for GWAS) the map information of genotype, at least 3 columns are: SNPs, chromosome, physical position. |
Pi |
vector, the proportion of zero effect and non-zero effect SNPs, the first value must be the proportion of non-effect markers. |
fold |
proportion of variance explained for groups of SNPs, the default is c(0, 0.0001, 0.001, 0.01). |
niter |
the number of MCMC iteration. |
nburn |
the number of iterations to be discarded. |
thin |
the number of thinning after burn-in. Note that smaller thinning frequency may have higher accuracy of estimated parameters, but would result in more memory for collecting process, on contrary, bigger frequency may have negative effect on accuracy of estimations. |
windsize |
window size in bp for GWAS, the default is NULL. |
windnum |
fixed number of SNPs in a window for GWAS, if it is specified, 'windsize' will be invalid, the default is NULL. |
maf |
the effects of markers whose MAF is lower than the threshold will not be estimated. |
dfvr |
the number of degrees of freedom for the distribution of environmental variance. |
s2vr |
scale parameter for the distribution of environmental variance. |
vg |
prior value of genetic variance. |
dfvg |
the number of degrees of freedom for the distribution of genetic variance. |
s2vg |
scale parameter for the distribution of genetic variance. |
ve |
prior value of residual variance. |
dfve |
the number of degrees of freedom for the distribution of residual variance. |
s2ve |
scale parameter for the distribution of residual variance. |
printfreq |
frequency of printing iterative details on console. |
seed |
seed for random sample. |
threads |
number of threads used for OpenMP. |
verbose |
whether to print the iteration information on console. |
Value
the function returns a a 'blrMod' object containing
- $J
coefficient for genotype imputation residuals
- $Veps
estimated variance of genotype imputation residuals
- $epsilon
genotype imputation residuals
- $mu
the regression intercept
- $pi
estimated proportion of zero effect and non-zero effect SNPs
- $beta
estimated coefficients for all covariates
- $r
estimated environmental random effects
- $Vr
estimated variance for all environmental random effect
- $Vg
estimated genetic variance
- $Ve
estimated residual variance
- $h2
estimated heritability (h2 = Vg / (Vr + Vg + Ve))
- $g
data.frame, the first column is the list of individual id, the second column is the genomic estimated breeding value for all individuals, including genotyped and non-genotyped.
- $alpha
estimated effect size of all markers
- $e
residuals of the model
- $pip
the frequency for markers to be included in the model during MCMC iteration, also known as posterior inclusive probability (PIP)
- $gwas
WPPA is defined to be the window posterior probability of association, it is estimated by counting the number of MCMC samples in which
\alpha
is nonzero for at least one SNP in the window
- $MCMCsamples
the collected samples of posterior estimation for all the above parameters across MCMC iterations
References
Fernando, Rohan L., Jack CM Dekkers, and Dorian J. Garrick. "A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses." Genetics Selection Evolution 46.1 (2014): 1-13.
Henderson, C.R.: A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics 32(1), 69-83 (1976).
Examples
# Load the example data attached in the package
pheno_file_path = system.file("extdata", "demo.phe", package = "hibayes")
pheno = read.table(pheno_file_path, header=TRUE)
bfile_path = system.file("extdata", "demo", package = "hibayes")
bin = read_plink(bfile_path, threads=1)
fam = bin$fam
geno = bin$geno
map = bin$map
pedigree_file_path = system.file("extdata", "demo.ped", package = "hibayes")
ped = read.table(pedigree_file_path, header=TRUE)
# For GS/GP
## no environmental effects:
fit = ssbrm(T1~1, data=pheno, M=geno, M.id=fam[,2], pedigree=ped,
method="BayesCpi", niter=1000, nburn=600, thin=5, printfreq=100, threads=1)
## overview of the returned results
summary(fit)
## add fixed effects or covariates:
fit = ssbrm(T1~sex+bwt, data=pheno, M=geno, M.id=fam[,2], pedigree=ped,
method="BayesCpi")
## add environmental random effects:
fit = ssbrm(T1~(1|loc)+(1|dam), data=pheno, M=geno, M.id=fam[,2],
pedigree=ped, method="BayesCpi")
# For GWAS
fit = ssbrm(T1~sex+bwt+(1|dam), data=pheno, M=geno, M.id=fam[,2],
pedigree=ped, method="BayesCpi", map=map, windsize=1e6)
# get the SD of estimated SNP effects for markers
summary(fit)$alpha
# get the SD of estimated breeding values
summary(fit)$g