greml {qgg}R Documentation

Genomic rescticted maximum likelihood (GREML) analysis

Description

The greml function is used for the estimation of genomic parameters (co-variance, heritability and correlation) for linear mixed models using restricted maximum likelihood estimation (REML) and genomic prediction using best linear unbiased prediction (BLUP).

The linear mixed model can account for multiple genetic factors (fixed and random genetic marker effects), adjust for complex family relationships or population stratification and adjust for other non-genetic factors including lifestyle characteristics. Different genetic architectures (infinitesimal, few large and many small effects) is accounted for by modeling genetic markers in different sets as fixed or random effects and by specifying individual genetic marker weights. Different genetic models (e.g. additive and non-additive) can be specified by providing additive and non-additive genomic relationship matrices (GRMs) (constructed using grm). The GRMs can be accessed from the R environment or from binary files stored on disk facilitating the analyses of large-scale genetic data.

The output contains estimates of variance components, fixed and random effects, first and second derivatives of log-likelihood and the asymptotic standard deviation of parameter estimates.

Assessment of predictive accuracy (including correlation and R2, and AUC for binary phenotypes) can be obtained by providing greml with a data frame, or a list that contains sample IDs used in the validation (see examples for details).

Genomic parameters can also be estimated with DMU (http://www.dmu.agrsci.dk/DMU/) if interface =”DMU”. This option requires DMU to be installed locally, and the path to the DMU binary files has to be specified (see examples below for details).

Usage

greml(
  y = NULL,
  X = NULL,
  GRMlist = NULL,
  GRM = NULL,
  theta = NULL,
  ids = NULL,
  validate = NULL,
  maxit = 100,
  tol = 1e-05,
  bin = NULL,
  ncores = 1,
  wkdir = getwd(),
  verbose = FALSE,
  interface = "R",
  fm = NULL,
  data = NULL
)

Arguments

y

is a vector or matrix of phenotypes

X

is a design matrix for factors modeled as fixed effects

GRMlist

is a list providing information about GRM matrix stored in binary files on disk

GRM

is a list of one or more genomic relationship matrices

theta

is a vector of initial values of co-variance for REML estimation

ids

is a vector of individuals used in the analysis

validate

is a data frame or list of individuals used in cross-validation (one column/row for each validation set)

maxit

is the maximum number of iterations used in REML analysis

tol

is tolerance, i.e. convergence criteria used in REML

bin

is the directory for fortran binaries (e.g. DMU binaries dmu1 and dmuai)

ncores

is the number of cores used for the analysis

wkdir

is the working directory used for REML

verbose

is a logical; if TRUE it prints more details during optimization

interface

is used for specifying whether to use R or Fortran implementations of REML

fm

is a formula with model statement for the linear mixed model

data

is a data frame containing the phenotypic observations and fixed factors specified in the model statements

Value

returns a list structure including:

llik

log-likelihood at convergence

theta

covariance estimates from REML

asd

asymptotic standard deviation

b

vector of fixed effect estimates

varb

vector of variances of fixed effect estimates

g

vector or matrix of random effect estimates

e

vector or matrix of residual effects

accuracy

matrix of prediction accuracies (only returned if [validate?] is provided)

Author(s)

Peter Soerensen

References

Lee, S. H., & van der Werf, J. H. (2006). An efficient variance component approach implementing an average information REML suitable for combined LD and linkage mapping with a general complex pedigree. Genetics Selection Evolution, 38(1), 25.

Examples




# Simulate data
W <- matrix(rnorm(1000000), ncol = 1000)
	colnames(W) <- as.character(1:ncol(W))
	rownames(W) <- as.character(1:nrow(W))
y <- rowSums(W[, 1:10]) + rowSums(W[, 501:510]) + rnorm(nrow(W))

# Create model
data <- data.frame(y = y, mu = 1)
fm <- y ~ 0 + mu
X <- model.matrix(fm, data = data)

# Compute GRM
GRM <- grm(W = W)

# REML analyses
fitG <- greml(y = y, X = X, GRM = list(GRM))


# REML analyses and cross validation

# Create marker sets
setsGB <- list(A = colnames(W)) # gblup model
setsGF <- list(C1 = colnames(W)[1:500], C2 = colnames(W)[501:1000]) # gfblup model
setsGT <- list(C1 = colnames(W)[1:10], C2 = colnames(W)[501:510]) # true model

GB <- lapply(setsGB, function(x) {grm(W = W[, x])})
GF <- lapply(setsGF, function(x) {grm(W = W[, x])})
GT <- lapply(setsGT, function(x) {grm(W = W[, x])})

n <- length(y)
fold <- 10
nvalid <- 5

validate <- replicate(nvalid, sample(1:n, as.integer(n / fold)))
cvGB <- greml(y = y, X = X, GRM = GB, validate = validate)
cvGF <- greml(y = y, X = X, GRM = GF, validate = validate)
cvGT <- greml(y = y, X = X, GRM = GT, validate = validate)

cvGB$accuracy
cvGF$accuracy
cvGT$accuracy




[Package qgg version 1.1.2 Index]