slideWindowDb {shazam}R Documentation

Sliding window approach towards filtering sequences in a data.frame

Description

slideWindowDb determines whether each input sequence in a data.frame contains equal to or more than a given number of mutations in a given length of consecutive nucleotides (a "window") when compared to their respective germline sequence.

Usage

slideWindowDb(
  db,
  sequenceColumn = "sequence_alignment",
  germlineColumn = "germline_alignment_d_mask",
  mutThresh = 6,
  windowSize = 10,
  nproc = 1
)

Arguments

db

data.frame containing sequence data.

sequenceColumn

name of the column containing IMGT-gapped sample sequences.

germlineColumn

name of the column containing IMGT-gapped germline sequences.

mutThresh

threshold on the number of mutations in windowSize consecutive nucleotides. Must be between 1 and windowSize inclusive.

windowSize

length of consecutive nucleotides. Must be at least 2.

nproc

Number of cores to distribute the operation over. If the cluster has already been set earlier, then pass the cluster. This will ensure that it is not reset.

Value

a logical vector. The length of the vector matches the number of input sequences in db. Each entry in the vector indicates whether the corresponding input sequence should be filtered based on the given parameters.

See Also

See slideWindowSeq for applying the sliding window approach on a single sequence. See slideWindowTune for parameter tuning for mutThresh and windowSize.

Examples

# Use an entry in the example data for input and germline sequence
data(ExampleDb, package="alakazam")

# Apply the sliding window approach on a subset of ExampleDb
slideWindowDb(db=ExampleDb[1:10, ], sequenceColumn="sequence_alignment", 
              germlineColumn="germline_alignment_d_mask", 
              mutThresh=6, windowSize=10, nproc=1)


[Package shazam version 1.2.0 Index]