find_motifs {tidysq} | R Documentation |
Find given motifs
Description
Finds all given motifs in sequences and returns their positions.
Usage
find_motifs(x, name, motifs, ...)
## S3 method for class 'sq'
find_motifs(x, name, motifs, ..., NA_letter = getOption("tidysq_NA_letter"))
Arguments
x |
[ |
name |
[ |
motifs |
[ |
... |
further arguments to be passed from or to other methods. |
NA_letter |
[ |
Details
This function allows search of a given motif or motifs in the sq
object. It returns all motifs found with their start and end positions within
a sequence.
Value
A tibble
with following columns:
name |
name of the sequence in which a motif was found |
sought |
sought motif |
found |
found subsequence, may differ from sought if the motif contained ambiguous letters |
start |
position of first element of found motif |
end |
position of last element of found motif |
Motif capabilities and restrictions
There are more options than to simply create a motif that is a string representation of searched subsequence. For example, when using this function with any of standard types, i.e. ami, dna or rna, the user can create a motif with ambiguous letters. In this case the engine will try to match any of possible meanings of this letter. For example, take "B" from extended DNA alphabet. It means "not A", so it can be matched with "C", "G" and "T", but also "B", "Y" (either "C" or "T"), "K" (either "G" or "T") and "S" (either "C" or "G").
Full list of ambiguous letters with their meaning can be found on IUPAC site.
Motifs are also restricted in that the alphabets of sq
objects on
which search operations are conducted cannot contain "^" and "$" symbols.
These two have a special meaning - they are used to indicate beginning and
end of sequence respectively and can be used to limit the position of matched
subsequences.
See Also
Functions interpreting sq in biological context:
%has%()
,
complement()
,
translate()
Examples
# Creating objects to work on:
sq_dna <- sq(c("ATGCAGGA", "GACCGNBAACGAN", "TGACGAGCTTAG"),
alphabet = "dna_bsc")
sq_ami <- sq(c("AGNTYIKFGGAYTI", "MATEGILIAADGYTWIL", "MIPADHICAANGIENAGIK"),
alphabet = "ami_bsc")
sq_atp <- sq(c("mAmYmY", "nbAnsAmA", ""),
alphabet = c("mA", "mY", "nbA", "nsA"))
sq_names <- c("sq1", "sq2", "sq3")
# Finding motif of two alanines followed by aspartic acid or asparagine
# ("AAB" motif matches "AAB", "AAD" and "AAN"):
find_motifs(sq_ami, sq_names, "AAB")
# Finding "C" at fourth position:
find_motifs(sq_dna, sq_names, "^NNNC")
# Finding motif "I" at second-to-last position:
find_motifs(sq_ami, sq_names, "IX$")
# Finding multiple motifs:
find_motifs(sq_dna, sq_names, c("^ABN", "ANCBY", "BAN$"))
# Finding multicharacter motifs:
find_motifs(sq_atp, sq_names, c("nsA", "mYmY$"))