countPatterns {alakazam}R Documentation

Count sequence patterns

Description

countPatterns counts the fraction of times a set of character patterns occur in a set of sequences.

Usage

countPatterns(seq, patterns, nt = TRUE, trim = FALSE, label = "region")

Arguments

seq

character vector of either DNA or amino acid sequences.

patterns

list of sequence patterns to count in each sequence. If the list is named, then names will be assigned as the column names of output data.frame.

nt

if TRUE then seq are DNA sequences and and will be translated before performing the pattern search.

trim

if TRUE remove the first and last codon or amino acid from each sequence before the pattern search. If FALSE do not modify the input sequences.

label

string defining a label to add as a prefix to the output column names.

Value

A data.frame containing the fraction of times each sequence pattern was found.

Examples

seq <- c("TGTCAACAGGCTAACAGTTTCCGGACGTTC",
         "TGTCAGCAATATTATATTGCTCCCTTCACTTTC",
         "TGTCAAAAGTATAACAGTGCCCCCTGGACGTTC")
patterns <- c("A", "V", "[LI]")
names(patterns) <- c("arg", "val", "iso_leu")
countPatterns(seq, patterns, nt=TRUE, trim=TRUE, label="cdr3")
            

[Package alakazam version 1.3.0 Index]