calcObservedMutations {shazam}R Documentation

Count the number of observed mutations in a sequence.

Description

calcObservedMutations determines all the mutations in a given input sequence compared to its germline sequence.

Usage

calcObservedMutations(
  inputSeq,
  germlineSeq,
  regionDefinition = NULL,
  mutationDefinition = NULL,
  ambiguousMode = c("eitherOr", "and"),
  returnRaw = FALSE,
  frequency = FALSE
)

Arguments

inputSeq

input sequence. IUPAC ambiguous characters for DNA are supported.

germlineSeq

germline sequence. IUPAC ambiguous characters for DNA are supported.

regionDefinition

RegionDefinition object defining the regions and boundaries of the Ig sequences. Note, only the part of sequences defined in regionDefinition are analyzed. If NULL, mutations are counted for entire sequence.

mutationDefinition

MutationDefinition object defining replacement and silent mutation criteria. If NULL then replacement and silent are determined by exact amino acid identity.

ambiguousMode

whether to consider ambiguous characters as "either or" or "and" when determining and counting the type(s) of mutations. Applicable only if inputSeq and/or germlineSeq contain(s) ambiguous characters. One of c("eitherOr", "and"). Default is "eitherOr".

returnRaw

return the positions of point mutations and their corresponding mutation types, as opposed to counts of mutations across positions. Also returns the number of bases used as the denominator when calculating frequency. Default is FALSE.

frequency

logical indicating whether or not to calculate mutation frequencies. The denominator used is the number of bases that are not one of "N", "-", or "." in either the input or the germline sequences. If set, this overwrites returnRaw. Default is FALSE.

Details

Each mutation is considered independently in the germline context. For illustration, consider the case where the germline is TGG and the observed is TAC. When determining the mutation type at position 2, which sees a change from G to A, we compare the codon TGG (germline) to TAG (mutation at position 2 independent of other mutations in the germline context). Similarly, when determining the mutation type at position 3, which sees a change from G to C, we compare the codon TGG (germline) to TGC (mutation at position 3 independent of other mutations in the germline context).

If specified, only the part of inputSeq defined in regionDefinition is analyzed. For example, when using the default IMGT_V definition, then mutations in positions beyond 312 will be ignored. Additionally, non-triplet overhang at the sequence end is ignored.

Only replacement (R) and silent (S) mutations are included in the results. Excluded are:

In other words, a result that is NA or zero indicates absence of R and S mutations, not necessarily all types of mutations, such as the excluded ones mentioned above.

NA is also returned if inputSeq or germlineSeq is shorter than 3 nucleotides.

Value

For returnRaw=FALSE, an array with the numbers of replacement (R) and silent (S) mutations.

For returnRaw=TRUE, a list containing

For frequency=TRUE, regardless of returnRaw, an array with the frequencies of replacement (R) and silent (S) mutations.

Ambiguous characters

When there are ambiguous characters present, the user could choose how mutations involving ambiguous characters are counted through ambiguousMode. The two available modes are "eitherOr" and "and".

See Also

See observedMutations for counting the number of observed mutations in a data.frame.

Examples

# Use an entry in the example data for input and germline sequence
data(ExampleDb, package="alakazam")
in_seq <- ExampleDb[["sequence_alignment"]][100]
germ_seq <-  ExampleDb[["germline_alignment_d_mask"]][100]

# Identify all mutations in the sequence
ex1_raw <- calcObservedMutations(in_seq, germ_seq, returnRaw=TRUE)
# Count all mutations in the sequence
ex1_count <- calcObservedMutations(in_seq, germ_seq, returnRaw=FALSE)
ex1_freq <- calcObservedMutations(in_seq, germ_seq, returnRaw=FALSE, frequency=TRUE)
# Compare this with ex1_count
table(ex1_raw$pos$region, ex1_raw$pos$r)[, "1"]
table(ex1_raw$pos$region, ex1_raw$pos$s)[, "1"]
# Compare this with ex1_freq
table(ex1_raw$pos$region, ex1_raw$pos$r)[, "1"]/ex1_raw$nonN
table(ex1_raw$pos$region, ex1_raw$pos$s)[, "1"]/ex1_raw$nonN

# Identify only mutations the V segment minus CDR3
ex2_raw <- calcObservedMutations(in_seq, germ_seq, 
                                regionDefinition=IMGT_V, returnRaw=TRUE)
# Count only mutations the V segment minus CDR3
ex2_count <- calcObservedMutations(in_seq, germ_seq, 
                                  regionDefinition=IMGT_V, returnRaw=FALSE)
ex2_freq <- calcObservedMutations(in_seq, germ_seq, 
                                 regionDefinition=IMGT_V, returnRaw=FALSE,
                                 frequency=TRUE)
# Compare this with ex2_count
table(ex2_raw$pos$region, ex2_raw$pos$r)[, "1"]
table(ex2_raw$pos$region, ex2_raw$pos$s)[, "1"]                              
# Compare this with ex2_freq
table(ex2_raw$pos$region, ex2_raw$pos$r)[, "1"]/ex2_raw$nonN     
table(ex2_raw$pos$region, ex2_raw$pos$s)[, "1"]/ex2_raw$nonN                                       

# Identify mutations by change in hydropathy class
ex3_raw <- calcObservedMutations(in_seq, germ_seq, regionDefinition=IMGT_V,
                                mutationDefinition=HYDROPATHY_MUTATIONS, 
                                returnRaw=TRUE)
# Count mutations by change in hydropathy class
ex3_count <- calcObservedMutations(in_seq, germ_seq, regionDefinition=IMGT_V,
                                  mutationDefinition=HYDROPATHY_MUTATIONS, 
                                  returnRaw=FALSE)
ex3_freq <- calcObservedMutations(in_seq, germ_seq, regionDefinition=IMGT_V,
                                 mutationDefinition=HYDROPATHY_MUTATIONS, 
                                 returnRaw=FALSE, frequency=TRUE)
# Compre this with ex3_count
table(ex3_raw$pos$region, ex3_raw$pos$r)[, "1"]
table(ex3_raw$pos$region, ex3_raw$pos$s)[, "1"]
# Compare this with ex3_freq
table(ex3_raw$pos$region, ex3_raw$pos$r)[, "1"]/ex3_raw$nonN                                        
table(ex3_raw$pos$region, ex3_raw$pos$s)[, "1"]/ex3_raw$nonN                                        
                                

[Package shazam version 1.2.0 Index]