hsp_nearest_neighbor {castor}R Documentation

Hidden state prediction based on nearest neighbor.

Description

Predict unknown (hidden) character states of tips on a tree using nearest neighbor matching.

Usage

hsp_nearest_neighbor(tree, tip_states, check_input=TRUE)

Arguments

tree

A rooted tree of class "phylo".

tip_states

A vector of length Ntips, specifying the state of each tip in the tree. Tip states can be any valid data type (e.g., characters, integers, continuous numbers, and so on). NA values denote unknown (hidden) tip states to be predicted.

check_input

Logical, specifying whether to perform some basic checks on the validity of the input data. If you are certain that your input data are valid, you can set this to FALSE to reduce computation.

Details

For each tip with unknown state, this function seeks the closest tip with known state, in terms of patristic distance. The state of the closest tip is then used as a prediction of the unknown state. In the case of multiple equal matches, the precise outcome is unpredictable (this is unlikely to occur if edge lengths are continuous numbers, but may happen frequently if e.g. edge lengths are all of unit length). This algorithm is arguably one of the crudest methods for predicting character states, so use at your own discretion.

Any NA entries in tip_states are interpreted as unknown states. If tree$edge.length is missing, each edge in the tree is assumed to have length 1. The tree may include multifurcations (i.e. nodes with more than 2 children) as well as monofurcations (i.e. nodes with only one child). Tips must be represented in tip_states in the same order as in tree$tip.label. tip_states need not include names; if names are included, however, they are checked for consistency with the tree's tip labels (if check_input==TRUE).

Value

A list with the following elements:

success

Logical, indicating whether HSP was successful. If FALSE, some return values may be NULL.

states

Vector of length Ntips, listing the known and predicted state for each tip.

nearest_neighbors

Integer vector of length Ntips, listing for each tip the index of the nearest tip with known state. Hence, nearest_neighbors[n] specifies the tip from which the unknown state of tip n was inferred. If tip n had known state, nearest_neighbors[n] will be n.

nearest_distances

Numeric vector of length Ntips, listing for each tip the patristic distance to the nearest tip with known state. For tips with known state, distances will be zero.

Author(s)

Stilianos Louca

References

J. R. Zaneveld and R. L. V. Thurber (2014). Hidden state prediction: A modification of classic ancestral state reconstruction algorithms helps unravel complex symbioses. Frontiers in Microbiology. 5:431.

See Also

hsp_max_parsimony, hsp_mk_model,

Examples

## Not run: 
# generate random tree
Ntips = 20
tree = generate_random_tree(list(birth_rate_intercept=1),max_tips=Ntips)$tree

# simulate a binary trait
Q = get_random_mk_transition_matrix(2, rate_model="ER")
tip_states = simulate_mk_model(tree, Q)$tip_states

# print tip states
print(tip_states)

# set half of the tips to unknown state
tip_states[sample.int(Ntips,size=as.integer(Ntips/2),replace=FALSE)] = NA

# reconstruct all tip states via nearest neighbor
predicted_states = hsp_nearest_neighbor(tree, tip_states)$states

# print predicted tip states
print(predicted_states)

## End(Not run)

[Package castor version 1.8.0 Index]