R: Map states of a discrete trait to integers.

map_to_state_space {castor}

R Documentation

Map states of a discrete trait to integers.

Description

Given a list of states (e.g., for each tip in a tree), map the unique states to integers 1,..,Nstates, where Nstates is the number of possible states. This function can be used to translate states that are originally represented by characters or factors, into integer states as required by ancestral state reconstruction and hidden state prediction functions in this package.

Usage

map_to_state_space(raw_states, fill_gaps=FALSE, 
                   sort_order="natural")

Arguments

`raw_states`	A vector of values (states), each of which can be converted to a character. This vector can include the same value multiple times, for example if values represent the trait's states for tips in a tree. The vector may also include `NA`, for example if they represent unknown states for some tree tips. NAs are omitted from the state space.
`fill_gaps`	Logical. If `TRUE`, then states are converted to integers using `as.integer(as.character())`, and then all missing intermediate integer values are included as additional possible states. For example, if `raw_states` contained the values 2,4,6, then 3 and 5 are assumed to also be possible states.
`sort_order`	Character, specifying the order in which raw_states should be mapped to ascending integers. Either "natural" or "alphabetical". If "natural", numerical parts of characters are sorted numerically, e.g. as in "3"<"a2"<"a12"<"b1".

Details

Several ancestral state reconstruction and hidden state prediction algorithms in the castor package (e.g., asr_max_parsimony) require that the focal trait's states are represented by integer indices within 1,..,Nstates. These indices are then associated, afor example, with column and row indices in the transition cost matrix (in the case of maximum parsimony reconstruction) or with column indices in the returned matrix containing marginal ancestral state probabilities (e.g., in asr_mk_model). The function map_to_state_space can be used to conveniently convert a set of discrete states into integers, for use with the aforementioned algorithms.

Value

A list with the following elements:

`Nstates`	Integer. Number of possible states for the trait, based on the unique values encountered in `raw_states` (after conversion to characters). This may be larger than the number of unique values in `raw_states`, if `fill_gaps` was set to `TRUE`.
`state_names`	Character vector of size Nstates, storing the original name (character version) of each unique state. For example, if `raw_states` was `c("b1","3","a12","a2","b1","a2", NA)` and `sort_order=="natural"`, then `Nstates` will be 4 and `state_names` will be `c("3","a2","a12","b1")`.
`state_values`	A numeric vector of size `Nstates`, providing the numerical value for each unique state. For example, the states "3","a2","4.5" will be mapped to the numeric values 3, NA, 4.5. Note that this may not always be meaningful, depending on the biological interpretation of the states.
`mapped_states`	Integer vector of size equal to `length(raw_states)`, listing the integer representation of each value in `raw_states`. May also include `NA`, at those locations where `raw_states` was `NA`.
`name2index`	An integer vector of size Nstates, with `names(name2index)` set to `state_names`. This vector can be used to map any new list of states (in character format) to their integer representation. In particular, `name2index[as.character(raw_states)]` is equal to `mapped_states`.

Author(s)

Stilianos Louca

Examples

# generate a sequence of random states
unique_states = c("b","c","a")
raw_states = unique_states[sample.int(3,size=10,replace=TRUE)]

# map to integer state space
mapping = map_to_state_space(raw_states)

cat(sprintf("Checking that original unique states is the same as the one inferred:\n"))
print(unique_states)
print(mapping$state_names)

cat(sprintf("Checking reversibility of mapping:\n"))
print(raw_states)
print(mapping$state_names[mapping$mapped_states])

[Package castor version 1.8.2 Index]