hsp_max_parsimony {castor} | R Documentation |
Hidden state prediction via maximum parsimony.
Description
Reconstruct ancestral discrete states of nodes and predict unknown (hidden) states of tips on a tree using maximum parsimony. Transition costs can vary between transitions, and can optionally be weighted by edge length.
Usage
hsp_max_parsimony(tree, tip_states, Nstates=NULL,
transition_costs="all_equal",
edge_exponent=0.0, weight_by_scenarios=TRUE,
check_input=TRUE)
Arguments
tree |
A rooted tree of class "phylo". The root is assumed to be the unique node with no incoming edge. |
tip_states |
An integer vector of size Ntips, specifying the state of each tip in the tree as an integer from 1 to Nstates, where Nstates is the possible number of states (see below). |
Nstates |
Either |
transition_costs |
Same as for the function |
edge_exponent |
Same as for the function |
weight_by_scenarios |
Logical, indicating whether to weight each optimal state of a node by the number of optimal maximum-parsimony scenarios in which the node is in that state. If FALSE, then all possible states of a node are weighted equally (i.e. are assigned equal probabilities). |
check_input |
Logical, specifying whether to perform some basic checks on the validity of the input data. If you are certain that your input data are valid, you can set this to |
Details
For this function, the trait's states must be represented by integers within 1,..,Nstates, where Nstates is the total number of possible states. If the states are originally in some other format (e.g. characters or factors), you should map them to a set of integers 1,..,Nstates. The order of states (if relevant) should be reflected in their integer representation. For example, if your original states are "small", "medium" and "large" and transition_costs=="sequential"
, it is advised to represent these states as integers 1,2,3. You can easily map any set of discrete states to integers using the function map_to_state_space
.
Any NA
entries in tip_states
are interpreted as unknown states. Prior to ancestral state reconstruction, the tree is temporarily prunned, keeping only tips with known state. The function then applies Sankoff's (1975) dynamic programming algorithm for ancestral state reconstruction, which determines the smallest number (or least costly if transition costs are uneven) of state changes along edges needed to reproduce the known tip states. The state probabilities of tips with unknown state are set to those of the most recent ancestor with reconstructed states, as described by Zaneveld and Thurber (2014). This function has asymptotic time complexity O(Ntips+Nnodes x Nstates).
If tree$edge.length
is missing, each edge in the tree is assumed to have length 1. If edge_exponent
is 0, then edge lengths do not influence the result. If edge_exponent!=0
, then all edges must have non-zero length. The tree may include multi-furcations (i.e. nodes with more than 2 children) as well as mono-furcations (i.e. nodes with only one child).
Tips must be represented in tip_states
in the same order as in tree$tip.label
. None of the input vectors or matrixes need include row or column names; if they do, however, they are checked for consistency (if check_input==TRUE
).
This function is meant for reconstructing ancestral states in all nodes of a tree as well as predicting the states of tips with an a priory unknown state. If the state of all tips is known and only ancestral state reconstruction is needed, consider using the function asr_max_parsimony
for improved efficiency.
Value
A list with the following elements:
success |
Logical, indicating whether HSP was successful. If |
likelihoods |
A 2D numeric matrix, listing the probability of each tip and node being in each state. This matrix will have (Ntips+Nnodes) rows and Nstates columns, where Nstates was either explicitly provided as an argument or inferred based on the number of unique values in |
states |
Integer vector of length Ntips+Nnodes, with values in {1,..,Nstates}, specifying the maximum-likelihood estimate of the state of each tip & node. |
Author(s)
Stilianos Louca
References
D. Sankoff (1975). Minimal mutation trees of sequences. SIAM Journal of Applied Mathematics. 28:35-42.
J. Felsenstein (2004). Inferring Phylogenies. Sinauer Associates, Sunderland, Massachusetts.
J. R. Zaneveld and R. L. V. Thurber (2014). Hidden state prediction: A modification of classic ancestral state reconstruction algorithms helps unravel complex symbioses. Frontiers in Microbiology. 5:431.
See Also
asr_max_parsimony
,
asr_mk_model
,
hsp_mk_model
,
map_to_state_space
Examples
## Not run:
# generate random tree
Ntips = 10
tree = generate_random_tree(list(birth_rate_intercept=1),max_tips=Ntips)$tree
# simulate a discrete trait
Nstates = 5
Q = get_random_mk_transition_matrix(Nstates, rate_model="ER")
tip_states = simulate_mk_model(tree, Q)$tip_states
# print tip states
print(tip_states)
# set half of the tips to unknown state
tip_states[sample.int(Ntips,size=as.integer(Ntips/2),replace=FALSE)] = NA
# reconstruct all tip states via MPR
likelihoods = hsp_max_parsimony(tree, tip_states, Nstates)$likelihoods
estimated_tip_states = max.col(likelihoods[1:Ntips,])
# print estimated tip states
print(estimated_tip_states)
## End(Not run)