R: Positions of possibly degenerated motifs within sequences

words.pos {seqinr}

R Documentation

Positions of possibly degenerated motifs within sequences

Description

word.pos searches all the occurences of the motif pattern within the sequence text and returns their positions. This function is based on regexp allowing thus for complex motif searches. The main difference with gregexpr is that non disjoint matches are reported here.

Usage

words.pos(pattern, text, ignore.case = FALSE,
                      perl = TRUE, fixed = FALSE, useBytes = TRUE, ...)

Arguments

`pattern`	character string containing a regular expression (or character string for `fixed = TRUE`) to be matched in the given character vector.
`text`	a character vector where matches are sought.
`ignore.case`	if `FALSE`, the pattern matching is case sensitive and if `TRUE`, case is ignored during matching.
`perl`	logical. Should perl-compatible regexps be used if available? Has priority over `extended`.
`fixed`	logical. If `TRUE`, pattern is a string to be matched as is. Overrides all conflicting arguments.
`useBytes`	logical. If `TRUE` the matching is done byte-by-byte rather than character-by-character.
`...`	arguments passed to `regexpr`.

Details

Default parameter values have been tuned for speed when working biological sequences.

Value

a vector of positions for which the motif pattern was found in the sequence text.

Author(s)

J.R. Lobry

References

citation("seqinr")

Examples

myseq <- "tatagaga"
words.pos("t", myseq)   # Should be 1 3
words.pos("tag", myseq) # Should be 3
words.pos("ga", myseq)  # Should be 5 7
# How to specify ambiguous base ? Look for YpR motifs by
words.pos("[ct][ag]", myseq) # Should be 1 3
#
# Show the difference with gregexpr:
#
words.pos("toto", "totototo")           # 1 3 5 (three overlapping matches)
unlist(gregexpr("toto",  "totototo")) # 1 5    (two disjoint matches)

[Package seqinr version 4.2-36 Index]