words.pos {seqinr} | R Documentation |
Positions of possibly degenerated motifs within sequences
Description
word.pos
searches all the occurences of the motif pattern
within the sequence text
and returns their positions. This
function is based on regexp
allowing thus for complex motif searches.
The main difference with gregexpr
is that non disjoint matches
are reported here.
Usage
words.pos(pattern, text, ignore.case = FALSE,
perl = TRUE, fixed = FALSE, useBytes = TRUE, ...)
Arguments
pattern |
character string containing a regular expression (or character string for |
text |
a character vector where matches are sought. |
ignore.case |
if |
perl |
logical. Should perl-compatible regexps be used if available?
Has priority over |
fixed |
logical. If |
useBytes |
logical. If |
... |
arguments passed to |
Details
Default parameter values have been tuned for speed when working biological sequences.
Value
a vector of positions for which the motif pattern
was
found in the sequence text
.
Author(s)
J.R. Lobry
References
citation("seqinr")
See Also
Examples
myseq <- "tatagaga"
words.pos("t", myseq) # Should be 1 3
words.pos("tag", myseq) # Should be 3
words.pos("ga", myseq) # Should be 5 7
# How to specify ambiguous base ? Look for YpR motifs by
words.pos("[ct][ag]", myseq) # Should be 1 3
#
# Show the difference with gregexpr:
#
words.pos("toto", "totototo") # 1 3 5 (three overlapping matches)
unlist(gregexpr("toto", "totototo")) # 1 5 (two disjoint matches)