slideWindowTune {shazam}R Documentation

Parameter tuning for sliding window approach

Description

Apply slideWindowDb over a search grid made of combinations of mutThresh and windowSize to help with picking a pair of values for these parameters. Parameter tuning can be performed by choosing a combination that gives a reasonable number of filtered/remaining sequences.

Usage

slideWindowTune(
  db,
  sequenceColumn = "sequence_alignment",
  germlineColumn = "germline_alignment_d_mask",
  dbMutList = NULL,
  mutThreshRange,
  windowSizeRange,
  verbose = TRUE,
  nproc = 1
)

Arguments

db

data.frame containing sequence data.

sequenceColumn

name of the column containing IMGT-gapped sample sequences.

germlineColumn

name of the column containing IMGT-gapped germline sequences.

dbMutList

if supplied, this should be a list consisting of data.frames returned as $pos in the nested list produced by calcObservedMutations with returnRaw=TRUE; otherwise, calcObservedMutations is called on columns sequenceColumn and germlineColumn of db. Default is NULL.

mutThreshRange

range of threshold on the number of mutations in windowSize consecutive nucleotides to try. Must be between 1 and maximum windowSizeRange inclusive.

windowSizeRange

range of length of consecutive nucleotides to try. The lower end must be at least 2.

verbose

whether to print out messages indicating current progress. Default is TRUE.

nproc

Number of cores to distribute the operation over. If the cluster has already been set earlier, then pass the cluster. This will ensure that it is not reset.

Details

If, in a given combination of mutThresh and windowSize, mutThresh is greater than windowSize, NAs will be returned for that particular combination. A message indicating that the combination has been "skipped" will be printed if verbose=TRUE.

If calcObservedMutations was previously run on db and saved, supplying $pos from the saved result as dbMutList could save time by skipping a second call of calcObservedMutations. This could be helpful especially when db is large.

Value

a list of logical matrices. Each matrix corresponds to a windowSize in windowSizeRange. Each column in a matrix corresponds to a mutThresh in mutThreshRange. Each row corresponds to a sequence. TRUE values mean the sequences has at least the number of mutations specified in the column name, for that windowSize.

See Also

slideWindowDb is called on db for tuning. See slideWindowTunePlot for visualization. See calcObservedMutations for generating dbMutList.

Examples

# Load and subset example data
data(ExampleDb, package="alakazam")
db <- ExampleDb[1:5, ]

# Try out thresholds of 2-4 mutations in window sizes of 7-9 nucleotides. 
# In this case, all combinations are legal.
slideWindowTune(db, mutThreshRange=2:4, windowSizeRange=7:9)

# Illegal combinations are skipped, returning NAs.
slideWindowTune(db, mutThreshRange=2:4, windowSizeRange=2:4, 
                verbose=FALSE)
                                                            
# Run calcObservedMutations separately
exDbMutList <- sapply(1:5, function(i) {
    calcObservedMutations(inputSeq=db[["sequence_alignment"]][i],
                          germlineSeq=db[["germline_alignment_d_mask"]][i],
                          returnRaw=TRUE)$pos })
slideWindowTune(db, dbMutList=exDbMutList, 
                mutThreshRange=2:4, windowSizeRange=2:4)

[Package shazam version 1.2.0 Index]