greml {qgg} | R Documentation |
Genomic rescticted maximum likelihood (GREML) analysis
Description
The greml function is used for the estimation of genomic parameters (co-variance, heritability and correlation) for linear mixed models using restricted maximum likelihood estimation (REML) and genomic prediction using best linear unbiased prediction (BLUP).
The linear mixed model can account for multiple genetic factors (fixed and random genetic marker effects), adjust for complex family relationships or population stratification and adjust for other non-genetic factors including lifestyle characteristics. Different genetic architectures (infinitesimal, few large and many small effects) is accounted for by modeling genetic markers in different sets as fixed or random effects and by specifying individual genetic marker weights. Different genetic models (e.g. additive and non-additive) can be specified by providing additive and non-additive genomic relationship matrices (GRMs) (constructed using grm). The GRMs can be accessed from the R environment or from binary files stored on disk facilitating the analyses of large-scale genetic data.
The output contains estimates of variance components, fixed and random effects, first and second derivatives of log-likelihood and the asymptotic standard deviation of parameter estimates.
Assessment of predictive accuracy (including correlation and R2, and AUC for binary phenotypes) can be obtained by providing greml with a data frame, or a list that contains sample IDs used in the validation (see examples for details).
Genomic parameters can also be estimated with DMU (http://www.dmu.agrsci.dk/DMU/) if interface =”DMU”. This option requires DMU to be installed locally, and the path to the DMU binary files has to be specified (see examples below for details).
Usage
greml(
y = NULL,
X = NULL,
GRMlist = NULL,
GRM = NULL,
theta = NULL,
ids = NULL,
validate = NULL,
maxit = 100,
tol = 1e-05,
bin = NULL,
ncores = 1,
wkdir = getwd(),
verbose = FALSE,
interface = "R",
fm = NULL,
data = NULL
)
Arguments
y |
is a vector or matrix of phenotypes |
X |
is a design matrix for factors modeled as fixed effects |
GRMlist |
is a list providing information about GRM matrix stored in binary files on disk |
GRM |
is a list of one or more genomic relationship matrices |
theta |
is a vector of initial values of co-variance for REML estimation |
ids |
is a vector of individuals used in the analysis |
validate |
is a data frame or list of individuals used in cross-validation (one column/row for each validation set) |
maxit |
is the maximum number of iterations used in REML analysis |
tol |
is tolerance, i.e. convergence criteria used in REML |
bin |
is the directory for fortran binaries (e.g. DMU binaries dmu1 and dmuai) |
ncores |
is the number of cores used for the analysis |
wkdir |
is the working directory used for REML |
verbose |
is a logical; if TRUE it prints more details during optimization |
interface |
is used for specifying whether to use R or Fortran implementations of REML |
fm |
is a formula with model statement for the linear mixed model |
data |
is a data frame containing the phenotypic observations and fixed factors specified in the model statements |
Value
returns a list structure including:
llik |
log-likelihood at convergence |
theta |
covariance estimates from REML |
asd |
asymptotic standard deviation |
b |
vector of fixed effect estimates |
varb |
vector of variances of fixed effect estimates |
g |
vector or matrix of random effect estimates |
e |
vector or matrix of residual effects |
accuracy |
matrix of prediction accuracies (only returned if [validate?] is provided) |
Author(s)
Peter Soerensen
References
Lee, S. H., & van der Werf, J. H. (2006). An efficient variance component approach implementing an average information REML suitable for combined LD and linkage mapping with a general complex pedigree. Genetics Selection Evolution, 38(1), 25.
Examples
# Simulate data
W <- matrix(rnorm(1000000), ncol = 1000)
colnames(W) <- as.character(1:ncol(W))
rownames(W) <- as.character(1:nrow(W))
y <- rowSums(W[, 1:10]) + rowSums(W[, 501:510]) + rnorm(nrow(W))
# Create model
data <- data.frame(y = y, mu = 1)
fm <- y ~ 0 + mu
X <- model.matrix(fm, data = data)
# Compute GRM
GRM <- grm(W = W)
# REML analyses
fitG <- greml(y = y, X = X, GRM = list(GRM))
# REML analyses and cross validation
# Create marker sets
setsGB <- list(A = colnames(W)) # gblup model
setsGF <- list(C1 = colnames(W)[1:500], C2 = colnames(W)[501:1000]) # gfblup model
setsGT <- list(C1 = colnames(W)[1:10], C2 = colnames(W)[501:510]) # true model
GB <- lapply(setsGB, function(x) {grm(W = W[, x])})
GF <- lapply(setsGF, function(x) {grm(W = W[, x])})
GT <- lapply(setsGT, function(x) {grm(W = W[, x])})
n <- length(y)
fold <- 10
nvalid <- 5
validate <- replicate(nvalid, sample(1:n, as.integer(n / fold)))
cvGB <- greml(y = y, X = X, GRM = GB, validate = validate)
cvGF <- greml(y = y, X = X, GRM = GF, validate = validate)
cvGT <- greml(y = y, X = X, GRM = GT, validate = validate)
cvGB$accuracy
cvGF$accuracy
cvGT$accuracy