hsp_nearest_neighbor {castor}R Documentation

Hidden state prediction based on nearest neighbor.


Predict unknown (hidden) character states of tips on a tree using nearest neighbor matching.


hsp_nearest_neighbor(tree, tip_states, check_input=TRUE)



A rooted tree of class "phylo".


A vector of length Ntips, specifying the state of each tip in the tree. Tip states can be any valid data type (e.g., characters, integers, continuous numbers, and so on). NA values denote unknown (hidden) tip states to be predicted.


Logical, specifying whether to perform some basic checks on the validity of the input data. If you are certain that your input data are valid, you can set this to FALSE to reduce computation.


For each tip with unknown state, this function seeks the closest tip with known state, in terms of patristic distance. The state of the closest tip is then used as a prediction of the unknown state. In the case of multiple equal matches, the precise outcome is unpredictable (this is unlikely to occur if edge lengths are continuous numbers, but may happen frequently if e.g. edge lengths are all of unit length). This algorithm is arguably one of the crudest methods for predicting character states, so use at your own discretion.

Any NA entries in tip_states are interpreted as unknown states. If tree$edge.length is missing, each edge in the tree is assumed to have length 1. The tree may include multifurcations (i.e. nodes with more than 2 children) as well as monofurcations (i.e. nodes with only one child). Tips must be represented in tip_states in the same order as in tree$tip.label. tip_states need not include names; if names are included, however, they are checked for consistency with the tree's tip labels (if check_input==TRUE).


A list with the following elements:


Logical, indicating whether HSP was successful. If FALSE, some return values may be NULL.


Vector of length Ntips, listing the known and predicted state for each tip.


Integer vector of length Ntips, listing for each tip the index of the nearest tip with known state. Hence, nearest_neighbors[n] specifies the tip from which the unknown state of tip n was inferred. If tip n had known state, nearest_neighbors[n] will be n.


Numeric vector of length Ntips, listing for each tip the patristic distance to the nearest tip with known state. For tips with known state, distances will be zero.


Stilianos Louca


J. R. Zaneveld and R. L. V. Thurber (2014). Hidden state prediction: A modification of classic ancestral state reconstruction algorithms helps unravel complex symbioses. Frontiers in Microbiology. 5:431.

See Also

hsp_max_parsimony, hsp_mk_model,


## Not run: 
# generate random tree
Ntips = 20
tree = generate_random_tree(list(birth_rate_intercept=1),max_tips=Ntips)$tree

# simulate a binary trait
Q = get_random_mk_transition_matrix(2, rate_model="ER")
tip_states = simulate_mk_model(tree, Q)$tip_states

# print tip states

# set half of the tips to unknown state
tip_states[sample.int(Ntips,size=as.integer(Ntips/2),replace=FALSE)] = NA

# reconstruct all tip states via nearest neighbor
predicted_states = hsp_nearest_neighbor(tree, tip_states)$states

# print predicted tip states

## End(Not run)

[Package castor version 1.7.0 Index]