R: Sliding window approach towards filtering sequences in a...

slideWindowDb {shazam}

R Documentation

Sliding window approach towards filtering sequences in a `data.frame`

Description

slideWindowDb determines whether each input sequence in a data.frame contains equal to or more than a given number of mutations in a given length of consecutive nucleotides (a "window") when compared to their respective germline sequence.

Usage

slideWindowDb(
  db,
  sequenceColumn = "sequence_alignment",
  germlineColumn = "germline_alignment_d_mask",
  mutThresh = 6,
  windowSize = 10,
  nproc = 1
)

Arguments

`db`	`data.frame` containing sequence data.
`sequenceColumn`	name of the column containing IMGT-gapped sample sequences.
`germlineColumn`	name of the column containing IMGT-gapped germline sequences.
`mutThresh`	threshold on the number of mutations in `windowSize` consecutive nucleotides. Must be between 1 and `windowSize` inclusive.
`windowSize`	length of consecutive nucleotides. Must be at least 2.
`nproc`	Number of cores to distribute the operation over. If the `cluster` has already been set earlier, then pass the `cluster`. This will ensure that it is not reset.

Value

a logical vector. The length of the vector matches the number of input sequences in db. Each entry in the vector indicates whether the corresponding input sequence should be filtered based on the given parameters.

Examples

# Use an entry in the example data for input and germline sequence
data(ExampleDb, package="alakazam")

# Apply the sliding window approach on a subset of ExampleDb
slideWindowDb(db=ExampleDb[1:10, ], sequenceColumn="sequence_alignment", 
              germlineColumn="germline_alignment_d_mask", 
              mutThresh=6, windowSize=10, nproc=1)

[Package shazam version 1.2.0 Index]