split_search {seqtrie} | R Documentation |
split_search
Description
Search for similar sequences based on splitting sequences into left and right sides and searching for matches in each side using a bi-directional anchored alignment.
Usage
split_search(
query,
target,
query_split,
target_split,
edge_trim = 0L,
max_distance = 0L,
...
)
Arguments
query |
A character vector of query sequences. |
target |
A character vector of target sequences. |
query_split |
index to split query sequence. Should be within (edge_trim, nchar(query)-edge_trim] or -1 to indicate no split. |
target_split |
index to split target sequence. Should be within (edge_trim, nchar(query)-edge_trim] or -1 to indicate no split. |
edge_trim |
number of bases to trim from each side of the sequence (default value: 0). |
max_distance |
how far to search in units of absolute distance. Can be a single value or a vector. Mutually exclusive with max_fraction. |
... |
additional arguments passed to |
Details
This function is useful for searching for similar sequences that may have variable windows of sequencing (e.g. different 5' and 3' primers) but contain the same core sequence or position. The two split parameters partition the query and target sequences into left and right sides, where left = stri_sub(sequence, edge_trim+1, split) and right = stri_sub(query, split+1, -edge_trim-1).
Value
data.frame with columns query, target, and distance.
Examples
# Consider two sets of sequences
# query1 AGACCTAA CCC
# target1 AAGACCTAA CC
# query2 GGGTGTAA CCACCC
# target2 GGTGTAA CCAC
# Despite having different frames, query1 and query2 and clearly
# match to target1 and target2, respectively.
# One could consider splitting based on a common core sequence,
# e.g. a common TAA stop codon.
split_search(query=c( "AGACCTAACCC", "GGGTGTAACCACCC"),
target=c("AAGACCTAACC", "GGTGTAACCAC"),
query_split=c(8, 8),
target_split=c(9, 7),
edge_trim=0,
max_distance=0)