fit_musse {castor}  R Documentation 
The Binary State Speciation and Extinction (BiSSE) model (Maddison et al. 2007) and its extension to Multiple State Speciation Extinction (MuSSE) models (FitzJohn et al. 2009, 2012), Hidden State Speciation Extinction (HiSSE) models (Beaulieu and O'meara, 2016) or Several Examined and Concealed Statesdependent Speciation and Extinction (SecSSE) models (van Els et al. 2018), describe a Poissonian cladogenic process whose birth/death (speciation/extinction) rates depend on the states of an evolving discrete trait. Specifically, extant tips either go extinct or split continuously in time at Poissonian rates, and birth/death rates at each extant tip depend on the current state of the tip; lineages tansition stochastically between states acccording to a continuoustime Markov process with fixed transition rates. At the end of the simulation (i.e., at "presentday"), extant lineages are sampled according to some statedependent probability ("sampling_fraction"), which may depend on proxy state. Optionally, tips may also be sampled continuously over time according to some Poissonian rate (which may depend on proxy state), in which case the resulting tree may not be ultrametric.
This function takes as main input a phylogenetic tree (ultrametric unless Poissonian sampling is included) and a list of tip proxy states, and fits the parameters of a BiSSE/MuSSE/HiSSE/SecSSE model to the data via maximumlikelihood. Tips can have missing (unknown) proxy states, and the function can account for biases in species sampling and biases in the identification of proxy states. The likelihood is calculated using a mathematically equivalent, but computationally more efficient variant, of the classical postordertraversal BiSSE/MuSSE/HiSSE/SecSSE algorithm, as described by Louca (2019). This function has been optimized for large phylogenetic trees, with a relatively small number of states (i.e. Nstates<<Ntips); its time complexity scales roughly linearly with Ntips.
fit_musse(tree, Nstates, NPstates = NULL, proxy_map = NULL, state_names = NULL, tip_pstates = NULL, tip_priors = NULL, sampling_fractions = 1, reveal_fractions = 1, sampling_rates = 0, transition_rate_model = "ARD", birth_rate_model = "ARD", death_rate_model = "ARD", transition_matrix = NULL, birth_rates = NULL, death_rates = NULL, first_guess = NULL, lower = NULL, upper = NULL, root_prior = "auto", root_conditioning = "auto", oldest_age = NULL, Ntrials = 1, max_start_attempts = 10, optim_algorithm = "subplex", optim_max_iterations = 10000, optim_max_evaluations = NULL, optim_rel_tol = 1e6, check_input = TRUE, include_ancestral_likelihoods = FALSE, Nthreads = 1, Nbootstraps = 0, Ntrials_per_bootstrap = NULL, max_condition_number = 1e4, relative_ODE_step = 0.1, E_value_step = 1e4, D_temporal_resolution = 100, max_model_runtime = NULL, verbose = TRUE, diagnostics = FALSE, verbose_prefix = "")
tree 
Phylogenetic tree of class "phylo", representing the evolutionary relationships between sampled species/lineages. Unless Poissonian sampling is included in the model (option 
Nstates 
Integer, specifying the number of possible discrete states a tip can have, influencing speciation/extinction rates. For example, if 
NPstates 
Integer, optionally specifying a number of "proxystates" that are observed instead of the underlying speciation/extinctionmodulating states. To fit a HiSSE/SecSSE model, 
proxy_map 
Integer vector of size 
state_names 
Optional character vector of size 
tip_pstates 
Integer vector of size Ntips, listing the proxy state at each tip, in the same order as tips are indexed in the tree. The vector may (but need not) include names; if it does, these are checked for consistency with the tree (if 
tip_priors 
Numeric matrix of size Ntips x 
sampling_fractions 
Numeric vector of size 
reveal_fractions 
Numeric vector of size 
sampling_rates 
Numeric vector of size 
transition_rate_model 
Either a character or a 2D integer matrix of size Nstates x Nstates, specifying the model for the transition rates between states. This option controls the parametric complexity of the state transition model, i.e. the number of independent rates and the correspondence between independent and dependent rates. If a character, then it must be one of "ER", "SYM", "ARD", "SUEDE" or "SRD", as used for Mk models (see the function If an integer matrix, then it defines a custom parametric structure for the transition rates, by mapping entries of the transition matrix to a set of independent transitionrate parameters (numbered 1,2, and so on), similarly to the option 
birth_rate_model 
Either a character or an integer vector of length Nstates, specifying the model for the various birth (speciation) rates. This option controls the parametric complexity of the possible birth rates, i.e. the number of independent birth rates and the correspondence between independent and dependent birth rates. If a character, then it must be either "ER" (equal rates) or "ARD" (all rates different). If an integer vector, it must map each state to an indepedent birthrate parameter (indexed 1,2,..). For example, the vector 
death_rate_model 
Either a character or an integer vector of length Nstates, specifying the model for the various death (extinction) rates. Similar to 
transition_matrix 
Either 
birth_rates 
Either 
death_rates 
Either 
first_guess 
Either
Start values are only relevant for fitted (i.e., nonfixed) parameters. 
lower 
Either 
upper 
Either 
root_prior 
Either a character or a numeric vector of size Nstates, specifying the prior probabilities of states for the root, i.e. the weights for obtaining a single model likelihood by averaging the root's state likelihoods. If a character, then it must be one of "flat", "empirical", "likelihoods", "max_likelihood" or "auto". "empirical" means the root's prior is set to the proportions of (estimated) extant species in each state (correcting for sampling fractions and reveal fractions, if applicable). "likelihoods" means that the computed statelikelihoods of the root are used, after normalizing to obtain a probability distribution; this is the approach used in the package 
root_conditioning 
Character, specifying an optional modification to be applied to the root's state likelihoods prior to averaging. Can be "none" (no modification), "madfitz", "herr_als", "crown" or "stem". "madfitz" and "herr_als" (after van Els, Etiene and HerreraAlsina 2018) are the options implemented in the package 
oldest_age 
Strictly positive numeric, specifying the oldest age (time before present) to consider for fitting. If this is smaller than the tree's root age, then the tree is split into multiple subtrees at 
Ntrials 
Nonnegative integer, specifying the number of trials for fitting the model, using alternative (randomized) starting parameters at each trial. A larger 
max_start_attempts 
Integer, specifying the number of times to attempt finding a valid start point (per trial) before giving up on that trial. Randomly chosen extreme start parameters may occasionally result in Inf/undefined likelihoods, so this option allows the algorithm to keep looking for valid starting points. For complex models (e.g., with >5 states), setting this to 10, 100 or even 1000 may be beneficial. 
optim_algorithm 
Character, specifying the optimization algorithm for fitting. Must be one of either "optim", "nlminb" or "subplex". 
optim_max_iterations 
Integer, maximum number of iterations allowed for fitting. Only relevant for "optim" and "nlminb". 
optim_max_evaluations 
Integer, maximum number of function evaluations allowed for fitting. Only relevant for "nlminb" and "subplex". 
optim_rel_tol 
Numeric, relative tolerance for the fitted loglikelihood. 
check_input 
Logical, specifying whether to check the validity of input variables. If you are certain that all input variables are valid, you can set this to 
include_ancestral_likelihoods 
Logical, specifying whether to include the state likelihoods for each node, in the returned variables. These are the “D” variables calculated as part of the likelihood based on the subtree descending from each node, and may be used for "local" ancestral state reconstructions. 
Nthreads 
Integer, specifying the number of threads for running multiple fitting trials in parallel. Only relevant if 
Nbootstraps 
Integer, specifying an optional number of bootstrap samplings to perform, for estimating standard errors and confidence intervals of maximumlikelihood fitted parameters. If 0, no bootstrapping is performed. Typical values are 10100. At each bootstrap sampling, a simulation of the fitted MuSSE/HiSSE model is performed, the parameters are estimated anew based on the simulation, and subsequently compared to the original fitted parameters. Each bootstrap sampling will thus use roughly as many computational resources as the original maximumlikelihood fit (e.g., same number of trials, same optimization parameters etc). 
Ntrials_per_bootstrap 
Integer, specifying the number of fitting trials to perform for each bootstrap sampling. If 
max_condition_number 
Positive unitless number, specifying the maximum permissible condition number for the "G" matrix computed for the loglikelihood. A higher condition number leads to faster computation (roughly on a logscale) especially for large trees, at the potential expense of lower accuracy. Typical values are 1e21e5. See Louca (2019) for further details on the condition number of the G matrix. 
relative_ODE_step 
Positive unitless number, specifying the default relative time step for the ordinary differential equation solvers. 
E_value_step 
Positive unitless number, specifying the relative difference between subsequent recorded and interpolated Evalues, in the ODE solver for the extinction probabilities E (Louca 2019). Typical values are 1e2 to 1e5. A smaller 
D_temporal_resolution 
Positive unitless number, specifying the relative resolution for interpolating Gmap over time (Louca 2019). This is relative to the typical time scales at which Gmap varies. For example, a resolution of 10 means that within a typical time scale there will be 10 interpolation points. Typical values are 11000. A greater resolution increases interpolation accuracy, but also increases memory requirements and adds runtime (scaling with the tree's age span, not Ntips). 
max_model_runtime 
Numeric, optional maximum number of seconds for evaluating the likelihood of a model, prior to cancelling the calculation and returning Inf. This may be useful if extreme model parameters (e.g., reached transiently during fitting) require excessive calculation time. Parameters for which the calculation of the likelihood exceed this threshold, will be considered invalid and thus avoided during fitting. For example, for trees with 1000 tips a time limit of 10 seconds may be reasonable. If 0, no time limit is imposed. 
verbose 
Logical, specifying whether to print progress reports and warnings to the screen. In any case, fatal errors are always reported. 
diagnostics 
Logical, specifying whether to print detailed information (such as model likelihoods) at every iteration of the fitting routine. For debugging purposes mainly. 
verbose_prefix 
Character, specifying the line prefix for printing progress reports, warnings and errors to the screen. 
HiSSE/SecSSE models include two discrete traits, one trait that defines the rate categories of diversification rates (as in BiSSE/MuSSE), and one trait that does not itself influence diversification but whose states (here called "proxy states") each represent one or more of the diversitymodulating states. HiSSE models (Beaulieu and O'meara, 2016) and SecSSE models (van Els et al., 2018) are closely related to BiSSE/MuSSE models, the main difference being the fact that the actual diversificationmodulating states are not directly observed. In essence, a HiSSE/SecSSE model is a BiSSE/MuSSE model, where the final tip states are replaced by their proxy states, thus "masking" the underlying diversitymodulating trait. This function is able to fit HiSSE/SecSSE models with appropriate choice of the input variables Nstates
, NPstates
and proxy_map
.
Note that the terminology and setup of HiSSE/SecSSE models followed here differs from their description in the original papers by Beaulieu and O'meara (2016) and van Els et al. (2018), in order to achieve what we think is a more intuitive unification of BiSSE/MuSSE/HiSSE/SecSSE. For ease of terminology, when considering a BiSSE/MuSSE model, here we use the terms "states" and "proxystates" interchangeably, since under BiSSE/MuSSE the proxy trait can be considered identical to the diversificationmodulating trait. A distinction between "states" and "proxystates" is only relevant for HiSSE/SecSSE models.
As an example of a HiSSE model, Nstates=4
, NPstates=2
and proxy_map=c(1,2,1,2)
specifies that states 1 and 3 are represented by proxystate 1, and states 2 and 4 are represented by proxystate 2. This is the original case described by Beaulieu and O'Meara (2016); in their terminology, there would be 2 "hidden"" states ("0" and "1") and 2 "observed" states ("A" and "B"), and the 4 diversification rate categories (Nstates=4
) would be called "0A", "1A", "0B" and "1B". The somewhat different terminology used here allows for easier generalization to an arbitrary number of diversificationmodulating states and an arbitrary number of proxy states. For example, if there are 6 diversification modulating states, represented by 3 proxystates as 1>A, 2>A, 3>B, 4>C, 5>C, 6>C, then one would set Nstates=6
, NPstates=3
and proxy_map=c(1,1,2,3,3,3)
.
The run time of this function scales asymptotically linearly with tree size (Ntips), although run times can vary substantially depending on model parameters. As a rule of thumb, the higher the birth/death/transition rates are compared to the tree's overall time span, the slower the calculation becomes.
The following arguments control the tradeoff between accuracy and computational efficiency:
max_condition_number
: A smaller value means greater accuracy, at longer runtime and more memory.
relative_ODE_step
: A smaller value means greater accuracy, at longer runtime.
E_value_step
: A smaller value means greater accuracy, at longer runtime and more memory.
D_temporal_resolution
: A greater value means greater accuracy, at longer runtime and more memory.
Typically, the default values for these arguments should be fine. For smaller trees, where cladogenic and sampling stochasticity is the main source of uncertainty, these parameters can probably be made less stringent (i.e., leading to lower accuracy and faster computation), but then again for small trees computational efficiency may not be an issue anyway.
A named list with the following elements:
success 
Logical, indicating whether the fitting was successful. If 
Nstates 
Integer, the number of states assumed for the model. 
NPstates 
Integer, the number of proxy states assumed for the model. Note that in the case of a BiSSE/MuSSE model, this will be the same as 
root_prior 
Character, or numeric vector of length Nstates, specifying the root prior used. 
parameters 
Named list containing the final maximumlikelihood fitted model parameters. If

start_parameters 
Named list containing the default start parameter values for the fitting. Structured similarly to 
loglikelihood 
Numeric, the maximized loglikelihood of the model, if fitting succeeded. 
AIC 
Numeric, the Akaike Information Criterion for the fitted model, defined as 2k2\log(L), where k is the number of fitted parameters and L is the maximized likelihood. 
Niterations 
The number of iterations needed for the best fit. Only relevant if the optimization method was "optim" or "nlminb". 
Nevaluations 
Integer, the number of function evaluations needed for the best fit. Only relevant if the optimization method was "nlminb" or "subplex". 
converged 
Logical, indicating whether convergence was successful during fitting. If convergence was not achieved, and the fitting was stopped due to one of the stopping criteria 
warnings 
Character vector, listing any warnings encountered during evaluation of the likelihood function at the fitted parameter values. For example, this vector may contain warnings regarding the differential equation solvers or regarding the rank of the Gmatrix (Louca, 2019). 
subroots 
Integer vector, listing indices of tips/nodes in the tree that were considered as starting points of independent MuSSE processes. If 
ML_subroot_states 
Integer vector, with values between 1 and Nstates, giving the maximumlikelihood estimate of each subroot's state. 
ML_substem_states 
Integer vector, with values between 1 and Nstates, giving the maximumlikelihood estimate of the state at each subroot's stem (i.e., exactly at 
trial_start_loglikelihoods 
Numeric vector of length 
trial_loglikelihoods 
Numeric vector of length 
trial_Nstart_attempts 
Integer vector of length 
trial_Niterations 
Integer vector of length 
trial_Nevaluations 
Integer vector of length 
standard_errors 
Named list containing the elements "transition_matrix" (numeric matrix of size Nstates x Nstates), "birth_rates" (numeric vector of size Nstates) and "death_rates" (numeric vector of size Nstates), listing standard errors of all model parameters estimated using parametric bootstrapping. Only included if 
CI50lower 
Named list containing the elements "transition_matrix" (numeric matrix of size Nstates x Nstates), "birth_rates" (numeric vector of size Nstates) and "death_rates" (numeric vector of size Nstates), listing the lower end of the 50% confidence interval (i.e. the 25% quantile) for each model parameter, estimated using parametric bootstrapping. Only included if 
CI50upper 
Similar to 
CI95lower 
Similar to 
CI95upper 
Similar to 
CI 
2D numeric matrix, listing maximumlikelihood estimates, standard errors and confidence intervals for all model parameters (one row per parameter, one column for MLestimates, one column for standard errors, two columns per confidence interval). Standard errors and confidence intervals are as estimated using parametric bootstrapping. This matrix contains the same information as 
ancestral_likelihoods 
2D matrix of size Nnodes x Nstates, listing the computed statelikelihoods for each node in the tree. These may be used for "local" ancestral state reconstructions, based on the information contained in the subtree descending from each node. Note that for each node the ancestral likelihoods have been normalized for numerical reasons, however they should not be interpreted as actual probabilities. For each node n and state s, 
Stilianos Louca
W. P. Maddison, P. E. Midford, S. P. Otto (2007). Estimating a binary character's effect on speciation and extinction. Systematic Biology. 56:701710.
R. G. FitzJohn, W. P. Maddison, S. P. Otto (2009). Estimating traitdependent speciation and extinction rates from incompletely resolved phylogenies. Systematic Biology. 58:595611
R. G. FitzJohn (2012). Diversitree: comparative phylogenetic analyses of diversification in R. Methods in Ecology and Evolution. 3:10841092
J. M. Beaulieu and B. C. O'Meara (2016). Detecting hidden diversification shifts in models of traitdependent speciation and extinction. Systematic Biology. 65:583601.
D. Kuehnert, T. Stadler, T. G. Vaughan, A. J. Drummond (2016). Phylodynamics with migration: A computational framework to quantify population structure from genomic data. Molecular Biology and Evolution. 33:21022116.
P. van Els, R. S. Etiene, L. HerreraAlsina (2018). Detecting the dependence of diversification on multiple traits from phylogenetic trees and trait data. Systematic Biology. syy057.
S. Louca and M. W. Pennell (2020). A general and efficient algorithm for the likelihood of diversification and discretetrait evolutionary models. Systematic Biology. 69:545556.
simulate_dsse
,
asr_mk_model
,
fit_tree_model
# EXAMPLE 1: BiSSE model #               # Choose random BiSSE model parameters Nstates = 2 Q = get_random_mk_transition_matrix(Nstates, rate_model="ARD", max_rate=0.1) parameters = list(birth_rates = runif(Nstates,5,10), death_rates = runif(Nstates,0,5), transition_matrix = Q) rarefaction = 0.5 # randomly omit half of the tips # Simulate a tree under the BiSSE model simulation = simulate_musse(Nstates, parameters = parameters, max_tips = 1000, sampling_fractions = rarefaction) tree = simulation$tree tip_states = simulation$tip_states ## Not run: # fit BiSSE model to tree & tip data fit = fit_musse(tree, Nstates = Nstates, tip_pstates = tip_states, sampling_fractions = rarefaction) if(!fit$success){ cat(sprintf("ERROR: Fitting failed")) }else{ # compare fitted birth rates to true values errors = (fit$parameters$birth_rates  parameters$birth_rates) relative_errors = errors/parameters$birth_rates cat(sprintf("BiSSE relative birthrate errors:\n")) print(relative_errors) } ## End(Not run) # EXAMPLE 2: HiSSE model, with bootstrapping #               # Choose random HiSSE model parameters Nstates = 4 NPstates = 2 Q = get_random_mk_transition_matrix(Nstates, rate_model="ARD", max_rate=0.1) rarefaction = 0.5 # randomly omit half of the tips parameters = list(birth_rates = runif(Nstates,5,10), death_rates = runif(Nstates,0,5), transition_matrix = Q) # reveal the state of 30% & 60% of tips (in state 1 & 2, respectively) reveal_fractions = c(0.3,0.6) # use proxy map corresponding to Beaulieu and O'Meara (2016) proxy_map = c(1,2,1,2) # Simulate a tree under the HiSSE model simulation = simulate_musse(Nstates, NPstates = NPstates, proxy_map = proxy_map, parameters = parameters, max_tips = 1000, sampling_fractions = rarefaction, reveal_fractions = reveal_fractions) tree = simulation$tree tip_states = simulation$tip_proxy_states ## Not run: # fit HiSSE model to tree & tip data # run multiple trials to ensure global optimum # also estimate confidence intervals via bootstrapping fit = fit_musse(tree, Nstates = Nstates, NPstates = NPstates, proxy_map = proxy_map, tip_pstates = tip_states, sampling_fractions = rarefaction, reveal_fractions = reveal_fractions, Ntrials = 5, Nbootstraps = 10, max_model_runtime = 0.1) if(!fit$success){ cat(sprintf("ERROR: Fitting failed")) }else{ # compare fitted birth rates to true values errors = (fit$parameters$birth_rates  parameters$birth_rates) relative_errors = errors/parameters$birth_rates cat(sprintf("HiSSE relative birthrate errors:\n")) print(relative_errors) # print 95%confidence interval for first birth rate cat(sprintf("CI95 for lambda1: %g%g", fit$CI95lower$birth_rates[1], fit$CI95upper$birth_rates[1])) } ## End(Not run)