createSubstitutionMatrix {shazam} | R Documentation |
Builds a substitution model
Description
createSubstitutionMatrix
builds a 5-mer nucleotide substitution model by counting
the number of substitution mutations occuring in the center position for all 5-mer
motifs.
Usage
createSubstitutionMatrix(
db,
model = c("s", "rs"),
sequenceColumn = "sequence_alignment",
germlineColumn = "germline_alignment_d_mask",
vCallColumn = "v_call",
multipleMutation = c("independent", "ignore"),
returnModel = c("5mer", "1mer", "1mer_raw"),
minNumMutations = 50,
numMutationsOnly = FALSE
)
Arguments
db |
data.frame containing sequence data. |
model |
type of model to create. The default model, "s",
builds a model by counting only silent mutations. |
sequenceColumn |
name of the column containing IMGT-gapped sample sequences. |
germlineColumn |
name of the column containing IMGT-gapped germline sequences. |
vCallColumn |
name of the column containing the V-segment allele call. |
multipleMutation |
string specifying how to handle multiple mutations occuring
within the same 5-mer. If |
returnModel |
string specifying what type of model to return; one of
|
minNumMutations |
minimum number of mutations required to compute the 5-mer
substitution rates. If the number of mutations for a 5-mer
is below this threshold, its substitution rates will be
estimated from neighboring 5-mers. Default is 50.
Not required if |
numMutationsOnly |
when |
Details
Caution: The targeting model functions do NOT support ambiguous
characters in their inputs. You MUST make sure that your input and germline
sequences do NOT contain ambiguous characters (especially if they are
clonal consensuses returned from collapseClones
).
Value
For returnModel = "5mer"
:
When numMutationsOnly
is FALSE
, a 4x1024 matrix of column
normalized substitution rates for each 5-mer motif with row names defining
the center nucleotide, one of c("A", "C", "G", "T")
, and column names
defining the 5-mer nucleotide sequence.
When numMutationsOnly
is
TRUE
, a 1024x4 data frame with each row providing information on
counting the number of mutations for a 5-mer. Columns are named
fivemer.total
, fivemer.every
, inner3.total
, and
inner3.every
, corresponding to, respectively,
the total number of mutations when counted as a 5-mer,
whether there is mutation to every other base when counted as a 5-mer,
the total number of mutations when counted as an inner 3-mer, and
whether there is mutation to every other base when counted as an inner 3-mer.
For returnModel = "1mer"
or "1mer_raw"
:
a 4x4 normalized or un-normalized 1-mer substitution matrix respectively.
References
Yaari G, et al. Models of somatic hypermutation targeting and substitution based on synonymous mutations from high-throughput immunoglobulin sequencing data. Front Immunol. 2013 4(November):358.
See Also
extendSubstitutionMatrix, createMutabilityMatrix, createTargetingMatrix, createTargetingModel, minNumMutationsTune.
Examples
# Subset example data to one isotype and sample as a demo
data(ExampleDb, package="alakazam")
db <- subset(ExampleDb, c_call == "IGHA" & sample_id == "-1h")[1:25,]
# Count the number of mutations per 5-mer
subCount <- createSubstitutionMatrix(db, sequenceColumn="sequence_alignment",
germlineColumn="germline_alignment_d_mask",
vCallColumn="v_call",
model="s", multipleMutation="independent",
returnModel="5mer", numMutationsOnly=TRUE)
# Create model using only silent mutations
sub <- createSubstitutionMatrix(db, sequenceColumn="sequence_alignment",
germlineColumn="germline_alignment_d_mask",
vCallColumn="v_call",
model="s", multipleMutation="independent",
returnModel="5mer", numMutationsOnly=FALSE,
minNumMutations=20)