mvOU {mvMORPH} | R Documentation |
Multivariate Ornstein-Uhlenbeck model of continuous traits evolution
Description
This function allows the fitting of a multivariate Ornstein-Uhlenbeck (OU1) model with possibly multiple optima (OUM) for different "selective regimes". A "phylo" object with SIMMAP-like mapping of ancestral states is required to subdivise the tree (or branches) into "selective regimes". Species measurement errors or dispersions can also be included in the model.
Usage
mvOU(tree, data, error = NULL, model = c("OUM", "OU1"), param = list(sigma = NULL,
alpha = NULL, vcv = "fixedRoot", decomp = c("cholesky","spherical","eigen","qr",
"diagonal","upper","lower")), method = c("rpf", "sparse", "inverse",
"pseudoinverse", "univarpf"), scale.height = FALSE, optimization = c("L-BFGS-B",
"Nelder-Mead", "subplex"), control = list(maxit = 20000), precalc = NULL,
diagnostic = TRUE, echo = TRUE)
Arguments
tree |
Phylogenetic tree with mapped ancestral states in phytools' SIMMAP format for "OUM" model. (See make.simmap, paintBranches, paintSubTree, and make.era.map functions from the phytools package). A "phylo" object can be used with model "OU1". |
data |
Matrix or data frame with species in rows and continuous traits in columns. NA values are allowed with the "rpf", "inverse", and "pseudoinverse" methods. |
error |
Matrix or data frame with species in rows and continuous trait sampling variance (squared standard errors) in columns. |
model |
Choose between "OUM" for a multiple selective regime model, or "OU1" for a unique selective regime for the whole tree. |
param |
List of arguments to be passed to the function. See details below. |
method |
Choose between "rpf", "sparse", "inverse", "pseudoinverse", or "univarpf" for computing the log-likelihood during the fitting process. See details below. |
scale.height |
Whether the tree should be scaled to unit length or not. |
optimization |
Methods used by the optimization routines (see ?optim and ?subplex for details). The "fixed" method returns the log-likelihood function only. |
control |
Max. bound for the number of iteration of the optimizer; other options can be fixed in the list. (See ?optim or ?subplex for details). |
precalc |
Optional. Precalculation of fixed parameters. See ?mvmorph.Precalc for details. |
diagnostic |
Whether the convergence diagnostics should be returned or not. |
echo |
Whether the results must be returned or not. |
Details
The mvOU function fits a multivariate model of evolution according to an Ornstein-Uhlenbeck process:
dX(t) = A(\Theta - X(t))dt + \Sigma^{1/2}dW(t)
The user can incorporate measurement errors and uses SIMMAP-like mapping of ancestral states (phytools objects of class "simmap"). SIMMAP mapping allows one to assign parts of branches to different selective regimes for estimating regime-specific evolutionary optima. See the package vignette: browseVignettes("mvMORPH").
Mapping of ancestral states can be done using the "make.simmap", "make.era.map" or "paintSubTree" functions from the "phytools" package.
The "method" argument allows the user to try different algorithms for computing the log-likelihood. The "rpf"
, "univarpf"
(for univariate analysis) and "sparse"
methods use fast GLS algorithms based on factorization to avoid explicit computation of the inverse of the variance-covariance matrix and its determinant during log-likelihood estimation. The "inverse"
approach uses a "stable" (based on QR decomposition) explicit computation of the inverse and determinant of the matrix and is therefore slower. The "pseudoinverse"
method uses a generalized inverse that is safer for matrix near singularity but highly time consuming. See ?mvLL for details.
Arguments in the "param" list
are:
"sigma" or "alpha" - Starting values for the likelihood search can be specified through the "alpha" and "sigma" arguments in the param list. It is also possible to test for the significance of the off-diagonal sigma (scatter) and alpha (drift) matrix in the full model by making comparison with a constrained model (using sigma="constraint", or alpha="constraint") in the "param" argument list. You can also provide starting values for the constrained model. For instance, for two traits use sigma=list("constraint", c(0.5,0.5)) (or alpha=list("constraint", c(0.5,0.5))).
"decomp" - You can further constrain the alpha matrix by specifying the decomposition of the matrix through the "decomp" argument in the "param" list. The multivariate Ornstein-Uhlenbeck model is described by the spectral decomposition of the alpha matrix. It is possible to parameterize the alpha matrix to be decomposable using various parameterizations (e.g., on its eigenvalues with different biological interpretations; Sy et al. 1997, Bartoszek et al. 2012; Clavel et al. 2015). For a symmetric matrix parameterization, the user can choose the "cholesky"
, "eigen"
, or "spherical"
option.
For general square (non-symmetric) matrices the "svd"
, "qr"
and "schur"
parameterizations can be used. The "schur"
parameterization constrains the eigenvalues of the alpha matrix to be real numbers. The "svd+"
, "qr+"
or "eigen+"
options force the eigenvalues to be positives by taking their logarithm. It is also possible to specify "diagonal"
which is similar to the use of the "constraint" argument for "alpha" argument, or to use "equal"
and "equaldiagonal"
. Finally, one can specify that the alpha matrix is "upper"
or "lower"
triangular (i.e., one process affect the other unilateraly). Details can be found in the package vignette: browseVignettes("mvMORPH") and on Clavel et al. 2015.
"decompSigma" - The sigma matrix is parameterized by various methods to ensure its positive definiteness (Pinheiro and Bates, 1996). These methods can be accessed through the "decompSigma" argument and are the "cholesky"
, "eigen+"
, and "spherical"
parameterization. The sigma matrix can also be forced to be diagonal using "diagonal"
or "equaldiagonal"
and forced to have the same variances using "equal"
. Details can be found in the package vignette: browseVignettes("mvMORPH").
"vcv" - It is possible to specify in the "param" list what kind of variance-covariance matrix to use with the "vcv" argument, depending on how the root is treated.
The vcv="randomRoot"
option assumes that the value at the root is a random variable with the stationary distribution of the process. It cannot be used with the "sparse" method to speed up the computations. The vcv="fixedRoot"
option assumes that the root is a fixed parameter. On ultrametric trees both approaches should converge on the same results when the OU process is stationary.
"root" - This argument allows the user to specify if the ancestral state at the root (theta 0) should be estimated (root=TRUE
), or assumed to be at the oldest regime state stationary distribution (root=FALSE
). An alternative is to follow Beaulieu et al. (2012) and explicitly drop the root state influence (root="stationary"
). The first option should be used with non-ultrametric trees (i.e., with fossil species; e.g., Hansen 1997) where information on the ancestral state is directly available from the data. Note that estimating shifts from the ancestral state to the optimum(s) from extant species can be problematic and it can be preferable to assume each regime optimum(s) to be at the stationary distribution.
For the "decomp" and "decompSigma arguments, a user-defined matrix with integer values taken as indices of the parameters to be estimated can be provided. See ?mvBM and ?mvRWTS.
Note on the returned Hessian matrix in the result list (param$opt$hessian):
The hessian is the matrix of second order partial derivatives of the likelihood function with respect to the maximum likelihood parameter values. This matrix provides a measure of the steepness of the likelihood surface in the vicinity of the optimum. The eigen-decomposition of the hessian matrix allows assessing the reliability of the model fit (even if the optimizer has converged). When the optimization function does not converge on a stable result, the user may consider increasing the "maxit" argument in the "control" option, or try a simpler model with fewer parameters to estimate. Changing the starting values ("alpha" and "sigma" options in the param list) as well as the optimizing method ("optimization" option) may helps sometimes (e.g., alpha=runif(3) for a two-trait analysis with random starting values - i.e., the lower triangular alpha matrix). Note that the number of starting values to provide depends on the matrix decomposition chosen for the alpha parameter (p*(p+1)/2 values for symmetric alpha matrix, but p*p values for non-symmetric ones - with p the number of traits).
Value
LogLik |
The log-likelihood of the optimal model. |
AIC |
Akaike Information Criterion for the optimal model. |
AICc |
Sample size-corrected AIC. |
theta |
Estimated ancestral states. |
alpha |
Matrix of estimated alpha values (strength of selection). |
sigma |
Evolutionary rate matrix (drift). |
convergence |
Convergence status of the optimizing function; "0" indicates convergence. (see ?optim for details). |
hess.values |
Reliability of the likelihood estimates calculated through the eigen-decomposition of the hessian matrix. "0" means that a reliable estimate has been reached. See details above. |
param |
List of model fit parameters (optimization, method, model, number of parameters...). |
llik |
The log-likelihood function evaluated in the model fit "$llik(par, root.mle=TRUE)". |
Note
This function partly uses a modified version of the C code from the "OUCH" package built by Aaron King, as well as a C code which is part of the "ape" package by Emmanuel Paradis. I kindly thank those authors for sharing their sources. Note that Bartoszek et al. (2012) proposed the mvSLOUCH package also dedicated to multivariate Ornstein-Uhlenbeck processes, which allows fitting regression models with randomly evolving predictor variables.
The "symmetric", "nsymmetric", "symmetricPositive", and "nsymPositive" options for the "decomp" argument are deprecated.
Author(s)
Julien Clavel
References
Bartoszek K., Pienaar J., Mostad P., Andersson S., Hansen T.F. 2012. A phylogenetic comparative method for studying multivariate adaptation. J. Theor. Biol. 314:204-215.
Beaulieu J.M., Jhwueng D.-C., Boettiger C., O'Meara B.C. 2012. Modeling stabilizing selection: Expanding the Ornstein-Uhlenbeck model of adaptive evolution. Evolution. 66:2369-2389.
Butler M.A., King A.A. 2004. Phylogenetic comparative analysis: a modeling approach for adaptive evolution. Am. Nat. 164:683-695.
Clavel J., Escarguel G., Merceron G. 2015. mvMORPH: an R package for fitting multivariate evolutionary models to morphometric data. Methods Ecol. Evol. 6(11):1311-1319.
Hansen T.F. 1997. Stabilizing selection and the comparative analysis of adaptation. Evolution. 51:1341-1351.
Pinheiro J.C., Bates D.M. 1996. Unconstrained parameterizations for variance-covariance matrices. Stat. Comput. 6:289-296.
Sy J.P., Taylor J.M.G., Cumberland W.G. 1997. A stochastic model for the analysis of bivariate longitudinal AIDS data. Biometrics. 53:542-555.
See Also
mvMORPH
mvgls
halflife
stationary
mvBM
mvEB
mvSHIFT
mvOUTS
mvRWTS
mvSIM
LRT
optim
make.simmap
make.era.map
paintSubTree
Examples
# Simulated dataset
set.seed(14)
# Generating a random tree
tree<-pbtree(n=50)
# Setting the regime states of tip species
sta<-as.vector(c(rep("Forest",20),rep("Savannah",30))); names(sta)<-tree$tip.label
# Making the simmap tree with mapped states
tree<-make.simmap(tree,sta , model="ER", nsim=1)
col<-c("blue","orange"); names(col)<-c("Forest","Savannah")
# Plot of the phylogeny for illustration
plotSimmap(tree,col,fsize=0.6,node.numbers=FALSE,lwd=3, pts=FALSE)
# Simulate the traits
alpha<-matrix(c(2,0.5,0.5,1),2)
sigma<-matrix(c(0.1,0.05,0.05,0.1),2)
theta<-c(2,3,1,1.3)
data<-mvSIM(tree, param=list(sigma=sigma, alpha=alpha, ntraits=2, theta=theta,
names_traits=c("head.size","mouth.size")), model="OUM", nsim=1)
## Fitting the models
# OUM - Analysis with multiple optima
mvOU(tree, data)
# OU1 - Analysis with a unique optimum
mvOU(tree, data, model="OU1", method="sparse")
# various options
mvOU(tree, data, model="OUM", method="sparse", scale.height=FALSE,
param=list(decomp="svd", root="stationary"))# non-symmetric alpha
mvOU(tree, data, model="OUM", method="sparse", scale.height=FALSE,
param=list(decomp="qr", root=TRUE)) # non-symmetric alpha
mvOU(tree, data, model="OUM", method="sparse", scale.height=FALSE,
param=list(decomp="cholesky", root=TRUE)) # symmetric-positive
# OUCH setting
mvOU(tree, data, model="OUM", method="rpf", scale.height=FALSE,
param=list(decomp="cholesky", root=FALSE, vcv="ouch"))
## Univariate case - FAST with RPF
set.seed(14)
tree<-pbtree(n=500)
# Setting the regime states of tip species
sta<-as.vector(c(rep("Forest",200),rep("Savannah",300))); names(sta)<-tree$tip.label
# Making the simmap tree with mapped states
tree<-make.simmap(tree,sta , model="ER", nsim=1)
col<-c("blue","orange"); names(col)<-c("Forest","Savannah")
# Plot of the phylogeny for illustration
plotSimmap(tree,col,fsize=0.6,node.numbers=FALSE,lwd=3, pts=FALSE)
# Parameters
alpha<-2.5
sigma<-0.1
theta<-c(0,2)
data<-mvSIM(tree, param=list(sigma=sigma, alpha=alpha, ntraits=1, theta=theta,
names_traits=c("body_size")), model="OUM", nsim=1)
# Fit the model
system.time(mvOU(tree, data, model="OUM", method="univarpf",
param=list(root="stationary")))
system.time(mvOU(tree, data, model="OU1", method="univarpf",
param=list(root="stationary")))
# Add measurement error
error=rnorm(500,sd=0.1)
mvOU(tree, data+error, error=rep(0.1^2,500), model="OUM", method="univarpf",
param=list(root="stationary"))