calcBaseline {shazam} | R Documentation |
Calculate the BASELINe PDFs (including for regions that include CDR3 and FWR4)
Description
calcBaseline
calculates the BASELINe posterior probability density
functions (PDFs) for sequences in the given Change-O data.frame
.
Usage
calcBaseline(
db,
sequenceColumn = "clonal_sequence",
germlineColumn = "clonal_germline",
testStatistic = c("local", "focused", "imbalanced"),
regionDefinition = NULL,
targetingModel = HH_S5F,
mutationDefinition = NULL,
calcStats = FALSE,
nproc = 1,
cloneColumn = NULL,
juncLengthColumn = NULL
)
Arguments
db |
|
sequenceColumn |
|
germlineColumn |
|
testStatistic |
|
regionDefinition |
RegionDefinition object defining the regions and boundaries of the Ig sequences. |
targetingModel |
TargetingModel object. Default is HH_S5F. |
mutationDefinition |
MutationDefinition object defining replacement
and silent mutation criteria. If |
calcStats |
|
nproc |
number of cores to distribute the operation over. If
|
cloneColumn |
|
juncLengthColumn |
|
Details
Calculates the BASELINe posterior probability density function (PDF) for
sequences in the provided db
.
Note: Individual sequences within clonal groups are not, strictly speaking,
independent events and it is generally appropriate to only analyze selection
pressures on an effective sequence for each clonal group. For this reason,
it is strongly recommended that the input db
contains one effective
sequence per clone. Effective clonal sequences can be obtained by calling
the collapseClones function.
If the db
does not contain the
required columns to calculate the PDFs (namely mu_count & mu_expected)
then the function will:
Calculate the numbers of observed mutations.
Calculate the expected frequencies of mutations and modify the provided
db
. The modifieddb
will be included as part of the returnedBaseline
object.
The testStatistic
indicates the statistical framework used to test for selection.
E.g.
-
local
= CDR_R / (CDR_R + CDR_S). -
focused
= CDR_R / (CDR_R + CDR_S + FWR_S). -
imbalanced
= CDR_R + CDR_S / (CDR_R + CDR_S + FWR_S + FRW_R).
For focused
the regionDefinition
must only contain two regions. If more
than two regions are defined the local
test statistic will be used.
For further information on the frame of these tests see Uduman et al. (2011).
Value
A Baseline object containing the modified db
and BASELINe
posterior probability density functions (PDF) for each of the sequences.
References
Hershberg U, et al. Improved methods for detecting selection by mutation analysis of Ig V region sequences. Int Immunol. 2008 20(5):683-94.
Uduman M, et al. Detecting selection in immunoglobulin sequences. Nucleic Acids Res. 2011 39(Web Server issue):W499-504.
Yaari G, et al. Models of somatic hypermutation targeting and substitution based on synonymous mutations from high-throughput immunoglobulin sequencing data. Front Immunol. 2013 4(November):358.
See Also
See Baseline for the return object. See groupBaseline and summarizeBaseline for further processing. See plotBaselineSummary and plotBaselineDensity for plotting results.
Examples
# Load and subset example data
data(ExampleDb, package="alakazam")
db <- subset(ExampleDb, c_call == "IGHG" & sample_id == "+7d")
# Collapse clones
db <- collapseClones(db, cloneColumn="clone_id",
sequenceColumn="sequence_alignment",
germlineColumn="germline_alignment_d_mask",
method="thresholdedFreq", minimumFrequency=0.6,
includeAmbiguous=FALSE, breakTiesStochastic=FALSE)
# Calculate BASELINe
baseline <- calcBaseline(db,
sequenceColumn="clonal_sequence",
germlineColumn="clonal_germline",
testStatistic="focused",
regionDefinition=IMGT_V,
targetingModel=HH_S5F,
nproc=1)