asr_mk_model {castor} | R Documentation |

Ancestral state reconstruction of a discrete trait using a fixed-rates continuous-time Markov model (a.k.a. "Mk model"). This function can estimate the (instantaneous) transition matrix using maximum likelihood, or take a specified transition matrix. The function can optionally calculate marginal ancestral state likelihoods for each node in the tree, using the rerooting method by Yang et al. (1995).

asr_mk_model( tree, tip_states, Nstates = NULL, tip_priors = NULL, rate_model = "ER", transition_matrix = NULL, include_ancestral_likelihoods = TRUE, reroot = TRUE, root_prior = "auto", Ntrials = 1, optim_algorithm = "nlminb", optim_max_iterations = 200, optim_rel_tol = 1e-8, store_exponentials = TRUE, check_input = TRUE, Nthreads = 1)

`tree` |
A rooted tree of class "phylo". The root is assumed to be the unique node with no incoming edge. |

`tip_states` |
An integer vector of size Ntips, specifying the state of each tip in the tree in terms of an integer from 1 to Nstates, where Ntips is the number of tips and Nstates is the number of possible states (see below). Can also be |

`Nstates` |
Either NULL, or an integer specifying the number of possible states of the trait. If |

`tip_priors` |
A 2D numeric matrix of size Ntips x Nstates, where Nstates is the possible number of states for the character modelled. Hence, |

`rate_model` |
Rate model to be used for fitting the transition rate matrix. Can be "ER" (all rates equal), "SYM" (transition rate i–>j is equal to transition rate j–>i), "ARD" (all rates can be different), "SUEDE" (only stepwise transitions i–>i+1 and i–>i-1 allowed, all 'up' transitions are equal, all 'down' transitions are equal) or "SRD" (only stepwise transitions i–>i+1 and i–>i-1 allowed, and each rate can be different). Can also be an index matrix that maps entries of the transition matrix to the corresponding independent rate parameter to be fitted. Diagonal entries should map to 0, since diagonal entries are not treated as independent rate parameters but are calculated from the remaining entries in the transition matrix. All other entries that map to 0 represent a transition rate of zero. The format of this index matrix is similar to the format used by the |

`transition_matrix` |
Either a numeric quadratic matrix of size Nstates x Nstates containing fixed transition rates, or |

`root_prior` |
Prior probability distribution of the root's states, used to calculate the model's overall likelihood from the root's marginal ancestral state likelihoods. Can be " |

`include_ancestral_likelihoods` |
Include the marginal ancestral likelihoods for each node (conditional scaled state likelihoods) in the return values. Note that this may increase the computation time and memory needed, so you may set this to |

`reroot` |
Reroot tree at each node when computing marginal ancestral likelihoods, according to Yang et al. (1995). This is the default and recommended behavior, but leads to increased computation time. If |

`Ntrials` |
Number of trials (starting points) for fitting the transition matrix. Only relevant if |

`optim_algorithm` |
Either "optim" or "nlminb", specifying which optimization algorithm to use for maximum-likelihood estimation of the transition matrix. Only relevant if |

`optim_max_iterations` |
Maximum number of iterations (per fitting trial) allowed for optimizing the likelihood function. |

`optim_rel_tol` |
Relative tolerance (stop criterion) for optimizing the likelihood function. |

`store_exponentials` |
Logical, specifying whether to pre-calculate and store exponentials of the transition matrix during calculation of ancestral likelihoods. This may reduce computation time because each exponential is only calculated once, but requires more memory since all exponentials are stored. Only relevant if |

`check_input` |
Logical, specifying whether to perform some basic checks on the validity of the input data. If you are certain that your input data are valid, you can set this to |

`Nthreads` |
Number of parallel threads to use for running multiple fitting trials simultaneously. This only makes sense if your computer has multiple cores/CPUs and if |

For this function, the trait's states must be represented by integers within 1,..,Nstates, where Nstates is the total number of possible states. If the states are originally in some other format (e.g. characters or factors), you should map them to a set of integers 1,..,Nstates. The order of states (if relevant) should be reflected in their integer representation. For example, if your original states are "small", "medium" and "large" and `rate_model=="SUEDE"`

, it is advised to represent these states as integers 1,2,3. You can easily map any set of discrete states to integers using the function `map_to_state_space`

.

This function allows the specification of the precise tip states (if these are known) using the vector `tip_states`

. Alternatively, if some tip states are only known in terms of a probability distribution, you can pass these probability distributions using the matrix `tip_priors`

. Note that exactly one of the two arguments, `tip_states`

or `tip_priors`

, must be non-`NULL`

.

Tips must be represented in `tip_states`

or `tip_priors`

in the same order as in `tree$tip.label`

. None of the input vectors or matrixes need include row or column names; if they do, however, they are checked for consistency (if `check_input==TRUE`

).

The tree is either assumed to be complete (i.e. include all possible species), or to represent a random subset of species chosen independently of their states. The rerooting method by Yang et al (1995) is used to calculate the marginal ancestral state likelihoods for each node by treating the node as a root and calculating its conditional scaled likelihoods. Note that the re-rooting algorithm is strictly speaking only valid for reversible Mk models, that is, satisfying the criterion

*
π_i Q_{ij}=π_j Q_{ji},\quad\forall i,j,
*

where *Q* is the transition rate matrix and *π* is the stationary distribution of the model. The rate models “ER”, 'SYM”, “SUEDE” and “SRD” are reversible. For example, for “SUEDE” or “SRD” choose *π_{i+1}=π_{i}Q_{i,i+1}/Q_{i+1,i}*. In contrast, “ARD” models are generally not reversible.

If `tree$edge.length`

is missing, each edge in the tree is assumed to have length 1. The tree may include multi-furcations (i.e. nodes with more than 2 children) as well as mono-furcations (i.e. nodes with only one child). This function is similar to `rerootingMethod`

in the `phytools`

package (v0.5-64) and similar to `ape::ace`

(v4.1) with options `method="ML", type="discrete"`

and `marginal=FALSE`

, but tends to be much faster than `rerootingMethod`

and `ace`

for large trees.

Internally, this function uses `fit_mk`

to estimate the transition matrix if the latter is not provided. If you only care about estimating the transition matrix but not the ancestral states, consider using the more versatile function `fit_mk`

.

A list with the following elements:

`success` |
Logical, indicating whether ASR was successful. If |

`Nstates` |
Integer, specifying the number of modeled trait states. |

`transition_matrix` |
A numeric quadratic matrix of size Nstates x Nstates, containing the transition rates of the Markov model. The [r,c]-th entry is the transition rate from state r to state c. Will be the same as the input |

`loglikelihood` |
Log-likelihood of the observed tip states under the fitted (or provided) Mk model. If |

`AIC` |
Numeric, the Akaike Information Criterion for the fitted Mk model (if applicable), defined as |

`ancestral_likelihoods` |
Optional, only returned if |

Stilianos Louca

Z. Yang, S. Kumar and M. Nei (1995). A new method for inference of ancestral nucleotide and amino acid sequences. Genetics. 141:1641-1650.

`hsp_mk_model`

,
`asr_max_parsimony`

,
`asr_squared_change_parsimony`

,
`hsp_max_parsimony`

,
`map_to_state_space`

,
`fit_mk`

## Not run: # generate random tree Ntips = 1000 tree = generate_random_tree(list(birth_rate_intercept=1),max_tips=Ntips)$tree # create random transition matrix Nstates = 5 Q = get_random_mk_transition_matrix(Nstates, rate_model="ER", max_rate=0.01) cat(sprintf("Simulated ER transition rate=%g\n",Q[1,2])) # simulate the trait's evolution simulation = simulate_mk_model(tree, Q) tip_states = simulation$tip_states cat(sprintf("Simulated states for last 20 nodes:\n")) print(tail(simulation$node_states,20)) # reconstruct node states from simulated tip states # at each node, pick state with highest marginal likelihood results = asr_mk_model(tree, tip_states, Nstates, rate_model="ER", Ntrials=2) node_states = max.col(results$ancestral_likelihoods) # print Mk model fitting summary cat(sprintf("Mk model: log-likelihood=%g\n",results$loglikelihood)) cat(sprintf("Fitted ER transition rate=%g\n",results$transition_matrix[1,2])) # print reconstructed node states for last 20 nodes print(tail(node_states,20)) ## End(Not run)

[Package *castor* version 1.7.0 Index]