fitDiscrete {geiger} | R Documentation |
Model fitting for discrete comparative data
Description
fitting macroevolutionary models to phylogenetic trees
Usage
fitDiscrete(phy, dat,
model = c("ER","SYM","ARD","meristic"),
transform = c("none", "EB", "lambda", "kappa", "delta", "white"),
bounds = list(), control = list(method = c("subplex", "L-BFGS-B"),
niter = 100, FAIL = 1e+200, hessian = FALSE, CI = 0.95), ncores=NULL,
...)
## S3 method for class 'gfit'
as.Qmatrix(x, ...)
Arguments
phy |
a phylogenetic tree of class phylo |
dat |
data vector for a single trait, with names matching tips in |
model |
an Mkn model to fit to comparative data (see Details) |
transform |
an evolutionary model used to transform the tree (see Details) |
bounds |
range to constrain parameter estimates (see Details) |
control |
settings used for optimization of the model likelihood |
ncores |
Number of cores. If |
x |
Object of class |
... |
if |
Details
This function fits various likelihood models for discrete character evolution. The function returns parameter estimates and the likelihood for univariate datasets. All of the models are continuous-time Markov models of trait evolution (see Yang 2006 for a good general discussion of this type of model).
The model likelihood is maximized using methods available in optim
as well as subplex
. Optimization methods to be used within optim
can be specified through the control
object.
A number of random starting points are used in optimization and are given through the niter
element within the control
object (e.g., control$niter
). Finding the maximum likelihood fit is sometimes tricky, especially as the number of parameters in the model increases. Even in the example below, a slightly suboptimal fit is occasionally returned with the default settings fitting the general (ARD
) model. There is no rule of thumb for the number of iterations that will be appropriate for a given dataset and model, but one use the variance in fitted likelihoods across iterations as an indication of the difficulty of the likelihood space (see details of the res
object in Value). Twenty optimization iterations per parameter seems to be a decent starting point for fitting these models.
The FAIL
value within the control
object should be a large value that will be considerably far from -lnL of the maximum model likelihood. In most cases, the default setting for control$FAIL
will be appropriate. The Hessian may be used to compute confidence intervals (CI
) for the parameter estimates if the hessian
element in control
is TRUE.
The function can handle traits with any number of character states, under a range of models. The character model is specified by the model
argument:
ER is an
equal-rates
model of where a single parameter governs all transition ratesSYM is a
symmetric
model where forward and reverse transitions share the same parameterARD is an
all-rates-different
model where each rate is a unique parametermeristic is a model wherein transitions occur in a stepwise fashion (e.g., 1 to 2 to 3 to 2) without skipping intermediate steps; this requires a sensible coding of the character states as consecutive integers are assumed to be neighboring states
matrix is a user supplied model (given as a dummy matrix representing transition classes between states); elements that are zero signify rates that are also zero (see Examples)
The transform
argument allows one to test models where rates vary across the tree. Bounds for the relevant parameters of the tree transform
may be given through the bounds
argument. Several bounds can be given at a time. Default bounds under the different models are given below.
Options for transform
are as follows:
none is a model of rate constancy through time
EB is the Early-burst model (Harmon et al. 2010) and also called the
ACDC
model (accelerating-decelerating; Blomberg et al. 2003). Set by thea
rate parameter,EB
fits a model where the rate of evolution increases or decreases exponentially through time, under the model r[t] = r[0] * exp(a * t), wherer[0]
is the initial rate,a
is the rate change parameter, andt
is time. Default bounds area = c(min = -10, max = 10)
lambda is one of the Pagel (1999) models that fits the extent to which the phylogeny predicts covariance among trait values for species. The model effectively transforms the tree: values of
lambda
near 0 cause the phylogeny to become more star-like, and alambda
value of 1 recovers thenone
model. Default bounds arelambda = c(min = 0, max = 1)
kappa is a punctuational (speciational) model of trait evolution (Pagel 1999), where character divergence is related to the number of speciation events between two species. Note that if there are speciation events in the given phylogeny (due to extinction or incomplete sampling), interpretation under the
kappa
model may be difficult. Considered as a tree transformation, the model raises all branch lengths to an estimated power (kappa
). Default bounds arekappa = c(min = 0, max = 1)
delta is a time-dependent model of trait evolution (Pagel 1999). The
delta
model is similar toACDC
insofar as thedelta
model fits the relative contributions of early versus late evolution in the tree to the covariance of species trait values. Wheredelta
is greater than 1, recent evolution has been relatively fast; ifdelta
is less than 1, recent evolution has been comparatively slow. Intrepreted as a tree transformation, the model raises all node depths to an estimated power (delta
). Default bounds aredelta = c(min = 0, max = 3)
white is a
white
-noise (non-phylogenetic) model, which converts the tree into a star phylogeny
Value
fitDiscrete
returns a list with the following four elements:
\bold{lik} |
is the function used to compute the model likelihood. The returned function ( |
\bold{bnd} |
is a matrix of the used bounds for the relevant parameters estimated in the model. Warnings will be issued if any parameter estimates occur at the supplied (or default) parameter bounds |
\bold{res} |
is a matrix of results from optimization. Rownames of the |
\bold{opt} |
is a list of the primary results: estimates of the parameters, the maximum-likelihood estimate ( |
Note
To speed the likelihood search, one may set an environment variable to make use of parallel processing, used by mclapply
. To set the environment variable, use options(mc.cores=INTEGER)
, where INTEGER
is the number of available cores. Alternatively, the mc.cores
variable may be preset upon the initiation of an R session (see Startup
for details).
Author(s)
LJ Harmon, RE Glor, RG FitzJohn, and JM Eastman
References
Yang Z. 2006. Computational Molecular Evolution. Oxford University Press: Oxford. FitzJohn RG, WP Maddison, and SP Otto. 2009. Estimating trait-dependent speciation and extinction rates from incompletely resolved molecular phylogenies. Systematic Biology 58:595-611.
Examples
## Not run:
## match data and tree
tmp=get(data(geospiza))
td=treedata(tmp$phy, tmp$dat)
geo=list(phy=td$phy, dat=td$data)
gb=round(geo$dat[,5]) ## create discrete data
names(gb)=rownames(geo$dat)
tmp=fitDiscrete(geo$phy, gb, model="ER", control=list(niter=5), ncores=2) #-7.119792
## using the returned likelihood function
lik=tmp$lik
lik(0.3336772, root="obs") #-7.119792
lik(0.3336772, root="flat") #-8.125354
lik(0.3336772, root="given", root.p=rep(1/3,3)) #-8.125354
lik(0.3336772, root="given", root.p=c(0, 1, 0)) #-7.074039
lik(c(0.3640363), root="given", root.p=rep(1,3)) #-7.020569 & comparable to ape:::ace solution
## End(Not run)
# general model (ARD)
## match data and tree
tmp=get(data(geospiza))
td=treedata(tmp$phy, tmp$dat)
geo=list(phy=td$phy, dat=td$data)
gb=round(geo$dat[,5]) ## create discrete data
names(gb)=rownames(geo$dat)
fitDiscrete(geo$phy, gb, model="ARD", ncores=1) #-6.064573
# user-specified rate classes
mm=rbind(c(NA, 0, 0), c(1, NA, 2), c(0, 2, NA))
fitDiscrete(geo$phy, gb, model=mm, ncores=1) #-7.037944
# symmetric-rates model
fitDiscrete(geo$phy, gb, model="SYM", ncores=1)#-6.822943