R: split_search

split_search {seqtrie}

R Documentation

split_search

Description

Search for similar sequences based on splitting sequences into left and right sides and searching for matches in each side using a bi-directional anchored alignment.

Usage

split_search(
  query,
  target,
  query_split,
  target_split,
  edge_trim = 0L,
  max_distance = 0L,
  ...
)

Arguments

`query`	A character vector of query sequences.
`target`	A character vector of target sequences.
`query_split`	index to split query sequence. Should be within (edge_trim, nchar(query)-edge_trim] or -1 to indicate no split.
`target_split`	index to split target sequence. Should be within (edge_trim, nchar(query)-edge_trim] or -1 to indicate no split.
`edge_trim`	number of bases to trim from each side of the sequence (default value: 0).
`max_distance`	how far to search in units of absolute distance. Can be a single value or a vector. Mutually exclusive with max_fraction.
`...`	additional arguments passed to `RadixTree$search`

Details

This function is useful for searching for similar sequences that may have variable windows of sequencing (e.g. different 5' and 3' primers) but contain the same core sequence or position. The two split parameters partition the query and target sequences into left and right sides, where left = stri_sub(sequence, edge_trim+1, split) and right = stri_sub(query, split+1, -edge_trim-1).

Value

data.frame with columns query, target, and distance.

Examples

# Consider two sets of sequences
# query1   AGACCTAA CCC
# target1 AAGACCTAA CC
# query2   GGGTGTAA CCACCC
# target2   GGTGTAA CCAC
# Despite having different frames, query1 and query2 and clearly 
# match to target1 and target2, respectively.
# One could consider splitting based on a common core sequence, 
# e.g. a common TAA stop codon. 
split_search(query=c(  "AGACCTAACCC", "GGGTGTAACCACCC"),
             target=c("AAGACCTAACC",   "GGTGTAACCAC"),
             query_split=c(8, 8),
             target_split=c(9, 7),
             edge_trim=0,
             max_distance=0)

[Package seqtrie version 0.2.8 Index]