EMM.cpp {RAINBOWR} | R Documentation |
Equation of mixed model for one kernel, a wrapper of two methods
Description
This function estimates maximum-likelihood (ML/REML; resticted maximum likelihood) solutions for the following mixed model.
y = X \beta + Z u + \epsilon
where \beta
is a vector of fixed effects and u
is a vector of random effects with
Var[u] = K \sigma^2_u
. The residual variance is Var[\epsilon] = I \sigma^2_e
.
Usage
EMM.cpp(
y,
X = NULL,
ZETA,
eigen.G = NULL,
eigen.SGS = NULL,
n.thres = 450,
reestimation = FALSE,
n.core = NA,
lam.len = 4,
init.range = c(1e-06, 100),
init.one = 0.5,
conv.param = 1e-06,
count.max = 20,
bounds = c(1e-06, 1e+06),
tol = NULL,
optimizer = "nlminb",
traceInside = 0,
REML = TRUE,
silent = TRUE,
plot.l = FALSE,
SE = FALSE,
return.Hinv = TRUE
)
Arguments
y |
A |
X |
A |
ZETA |
A list of variance (relationship) matrix (K; |
eigen.G |
A list with
The result of the eigen decompsition of |
eigen.SGS |
A list with
The result of the eigen decompsition of |
n.thres |
If |
reestimation |
If TRUE, EMM2.cpp is performed when the estimation by EMM1.cpp may not be accurate. |
n.core |
Setting n.core > 1 will enable parallel execution on a machine with multiple cores. |
lam.len |
The number of initial values you set. If this number is large, the estimation will be more accurate, but computational cost will be large. We recommend setting this value 3 <= lam.len <= 6. |
init.range |
The range of the initial parameters. For example, if lam.len = 5 and init.range = c(1e-06, 1e02), corresponding initial heritabilities will be calculated as seq(1e-06, 1 - 1e-02, length = 5), and then initial lambdas will be set. |
init.one |
The initial parameter if lam.len = 1. |
conv.param |
The convergence parameter. If the diffrence of log-likelihood by updating the parameter "lambda" is smaller than this conv.param, the iteration steps will be stopped. |
count.max |
Sometimes algorithms won't converge for some initial parameters. So if the iteration steps reache to this argument, you can stop the calculation even if algorithm doesn't converge. |
bounds |
Lower and Upper bounds of the parameter lambda. If the updated parameter goes out of this range, the parameter is reset to the value in this range. |
tol |
The tolerance for detecting linear dependencies in the columns of G = ZKZ'. Eigen vectors whose eigen values are less than "tol" argument will be omitted from results. If tol is NULL, top 'n' eigen values will be effective. |
optimizer |
The function used in the optimization process. We offer "optim", "optimx", and "nlminb" functions. |
traceInside |
Perform trace for the optimzation if traceInside >= 1, and this argument shows the frequency of reports. |
REML |
You can choose which method you will use, "REML" or "ML". If REML = TRUE, you will perform "REML", and if REML = FALSE, you will perform "ML". |
silent |
If this argument is TRUE, warning messages will be shown when estimation is not accurate. |
plot.l |
If you want to plot log-likelihood, please set plot.l = TRUE. We don't recommend plot.l = TRUE when lam.len >= 2. |
SE |
If TRUE, standard errors are calculated. |
return.Hinv |
If TRUE, the function returns the inverse of |
Value
- $Vu
Estimator for
\sigma^2_u
- $Ve
Estimator for
\sigma^2_e
- $beta
BLUE(
\beta
)- $u
BLUP(
u
)- $LL
Maximized log-likelihood (full or restricted, depending on method)
- $beta.SE
Standard error for
\beta
(If SE = TRUE)- $u.SE
Standard error for
u^*-u
(If SE = TRUE)- $Hinv
The inverse of
H = ZKZ' + \lambda I
(If return.Hinv = TRUE)- $Hinv2
The inverse of
H2 = ZKZ'/\lambda + I
(If return.Hinv = TRUE)- $lambda
Estimators for
\lambda = \sigma^2_e / \sigma^2_u
(Ifn >= n.thres
)- $lambdas
Lambdas for each initial values (If
n >= n.thres
)- $reest
If parameter estimation may not be accurate, reest = 1, else reest = 0 (If
n >= n.thres
)- $counts
The number of iterations until convergence for each initial values (If
n >= n.thres
)
References
Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.
Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.
Examples
### Perform genomic prediction with 10-fold cross validation
### Import RAINBOWR
require(RAINBOWR)
### Load example datasets
data("Rice_Zhao_etal")
Rice_geno_score <- Rice_Zhao_etal$genoScore
Rice_geno_map <- Rice_Zhao_etal$genoMap
Rice_pheno <- Rice_Zhao_etal$pheno
### View each dataset
See(Rice_geno_score)
See(Rice_geno_map)
See(Rice_pheno)
### Select one trait for example
trait.name <- "Flowering.time.at.Arkansas"
y <- as.matrix(Rice_pheno[, trait.name, drop = FALSE])
### Remove SNPs whose MAF <= 0.05
x.0 <- t(Rice_geno_score)
MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map)
x <- MAF.cut.res$x
map <- MAF.cut.res$map
### Estimate genomic relationship matrix (GRM)
K.A <- calcGRM(genoMat = x)
### Modify data
modify.res <- modify.data(pheno.mat = y, geno.mat = x, return.ZETA = TRUE)
pheno.mat <- modify.res$pheno.modi
ZETA <- modify.res$ZETA
### Solve linear mixed effects model
EMM.res <- EMM.cpp(y = pheno.mat, X = NULL, ZETA = ZETA)
(Vu <- EMM.res$Vu) ### estimated genetic variance
(Ve <- EMM.res$Ve) ### estimated residual variance
(herit <- Vu / (Vu + Ve)) ### genomic heritability
(beta <- EMM.res$beta) ### Here, this is an intercept.
u <- EMM.res$u ### estimated genotypic values
See(u)
### Estimate marker effects from estimated genotypic values
x.modi <- modify.res$geno.modi
WMat <- calcGRM(genoMat = x.modi, methodGRM = "addNOIA",
returnWMat = TRUE)
K.A <- ZETA$A$K
if (min(eigen(K.A)$values) < 1e-08) {
diag(K.A) <- diag(K.A) + 1e-06
}
mrkEffectsForW <- crossprod(x = WMat,
y = solve(K.A)) %*% as.matrix(u)
mrkEffects <- mrkEffectsForW / mean(scale(x.modi %*% mrkEffectsForW, scale = FALSE) / u)
#### Cross-validation for genomic prediction
noNA <- !is.na(c(pheno.mat)) ### NA (missing) in the phenotype data
phenoNoNA <- pheno.mat[noNA, , drop = FALSE] ### remove NA
ZETANoNA <- ZETA
ZETANoNA$A$Z <- ZETA$A$Z[noNA, ] ### remove NA
nFold <- 10 ### # of folds
nLine <- nrow(phenoNoNA)
idCV <- sample(1:nLine %% nFold) ### assign random ids for cross-validation
idCV[idCV == 0] <- nFold
yPred <- rep(NA, nLine)
for (noCV in 1:nFold) {
yTrain <- phenoNoNA
yTrain[idCV == noCV, ] <- NA ### prepare test data
EMM.resCV <- EMM.cpp(y = yTrain, X = NULL, ZETA = ZETANoNA) ### prediction
yTest <- EMM.resCV$beta + EMM.resCV$u ### predicted values
yPred[idCV == noCV] <- (yTest[noNA])[idCV == noCV]
}
### Plot the results
plotRange <- range(phenoNoNA, yPred)
plot(x = phenoNoNA, y = yPred,xlim = plotRange, ylim = plotRange,
xlab = "Observed values", ylab = "Predicted values",
main = "Results of Genomic Prediction",
cex.lab = 1.5, cex.main = 1.5, cex.axis = 1.3)
abline(a = 0, b = 1, col = 2, lwd = 2, lty = 2)
R2 <- cor(x = phenoNoNA[, 1], y = yPred) ^ 2
text(x = plotRange[2] - 10,
y = plotRange[1] + 10,
paste0("R2 = ", round(R2, 3)),
cex = 1.5)