RadixForest {seqtrie} | R Documentation |
RadixForest
Description
Radix Forest class implementation
Details
The RadixForest class is a specialization of the RadixTree implementation. Instead of putting sequences into a single tree, the RadixForest class puts sequences into separate trees based on sequence length. This allows for faster searching of similar sequences based on Hamming or Levenshtein distance metrics. Unlike the RadixTree class, the RadixForest class does not support anchored searches or a custom cost matrix. See RadixTree for additional details.
Public fields
forest_pointer
Map of sequence length to RadixTree
char_counter_pointer
Character count data for the purpose of validating input
Methods
Public methods
Method new()
Create a new RadixForest object
Usage
RadixForest$new(sequences = NULL)
Arguments
sequences
A character vector of sequences to insert into the forest
Method show()
Print the forest to screen
Usage
RadixForest$show()
Method to_string()
Print the forest to a string
Usage
RadixForest$to_string()
Method graph()
Plot of the forest using igraph
Usage
RadixForest$graph(depth = -1, root_label = "root", plot = TRUE)
Arguments
depth
The tree depth to plot for each tree in the forest.
root_label
The label of the root node(s) in the plot.
plot
Whether to create a plot or return the data used to generate the plot.
Returns
A data frame of parent-child relationships used to generate the igraph plot OR a ggplot2 object
Method to_vector()
Output all sequences held by the forest as a character vector
Usage
RadixForest$to_vector()
Returns
A character vector of all sequences contained in the forest.
Method size()
Output the size of the forest (i.e. how many sequences are contained)
Usage
RadixForest$size()
Returns
The size of the forest
Method insert()
Insert new sequences into the forest
Usage
RadixForest$insert(sequences)
Arguments
sequences
A character vector of sequences to insert into the forest
Returns
A logical vector indicating whether the sequence was inserted (TRUE) or already existing in the forest (FALSE)
Method erase()
Erase sequences from the forest
Usage
RadixForest$erase(sequences)
Arguments
sequences
A character vector of sequences to erase from the forest
Returns
A logical vector indicating whether the sequence was erased (TRUE) or not found in the forest (FALSE)
Method find()
Find sequences in the forest
Usage
RadixForest$find(query)
Arguments
query
A character vector of sequences to find in the forest
Returns
A logical vector indicating whether the sequence was found (TRUE) or not found in the forest (FALSE)
Method prefix_search()
Search for sequences in the forest that start with a specified prefix. E.g.: a query of "CAR" will find "CART", "CARBON", "CARROT", etc. but not "CATS".
Usage
RadixForest$prefix_search(query)
Arguments
query
A character vector of sequences to search for in the forest
Returns
A data frame of all matches with columns "query" and "target".
Method search()
Search for sequences in the forest that are with a specified distance metric to a specified query.
Usage
RadixForest$search( query, max_distance = NULL, max_fraction = NULL, mode = "levenshtein", nthreads = 1, show_progress = FALSE )
Arguments
query
A character vector of query sequences.
max_distance
how far to search in units of absolute distance. Can be a single value or a vector. Mutually exclusive with max_fraction.
max_fraction
how far to search in units of relative distance to each query sequence length. Can be a single value or a vector. Mutually exclusive with max_distance.
mode
The distance metric to use. One of hamming (hm), global (gb) or anchored (an).
nthreads
The number of threads to use for parallel computation.
show_progress
Whether to show a progress bar.
Returns
The output is a data.frame of all matches with columns "query" and "target".
Method validate()
Validate the forest
Usage
RadixForest$validate()
Returns
A logical indicating whether the forest is valid (TRUE) or not (FALSE). This is mostly an internal function for debugging purposes and should always return TRUE.
Examples
forest <- RadixForest$new()
forest$insert(c("ACGT", "AAAA"))
forest$erase("AAAA")
forest$search("ACG", max_distance = 1, mode = "levenshtein")
# query target distance
# 1 ACG ACGT 1
forest$search("ACG", max_distance = 1, mode = "hamming")
# query target distance
# <0 rows> (or 0-length row.names)