R: Calculate BASELINe summary statistics

summarizeBaseline {shazam}

R Documentation

Calculate BASELINe summary statistics

Description

summarizeBaseline calculates BASELINe statistics such as the mean selection strength (mean Sigma), the 95% confidence intervals and p-values for the presence of selection.

Usage

summarizeBaseline(baseline, returnType = c("baseline", "df"), nproc = 1)

Arguments

`baseline`	`Baseline` object returned by calcBaseline containing annotations and BASELINe posterior probability density functions (PDFs) for each sequence.
`returnType`	One of `c("baseline", "df")` defining whether to return a `Baseline` object ("baseline") with an updated `stats` slot or a data.frame ("df") of summary statistics.
`nproc`	number of cores to distribute the operation over. If `nproc` = 0 then the `cluster` has already been set and will not be reset.

Details

The returned p-value can be either positive or negative. Its magnitude (without the sign) should be interpreted as per normal. Its sign indicates the direction of the selection detected. A positive p-value indicates positive selection, whereas a negative p-value indicates negative selection.

Value

Either a modified Baseline object or data.frame containing the mean BASELINe selection strength, its 95% confidence intervals, and a p-value for the presence of selection.

References

Uduman M, et al. Detecting selection in immunoglobulin sequences. Nucleic Acids Res. 2011 39(Web Server issue):W499-504.

Examples


# Subset example data
data(ExampleDb, package="alakazam")
db <- subset(ExampleDb, c_call == "IGHG")
set.seed(112)
db <- dplyr::slice_sample(db, n=100)

# Collapse clones
db <- collapseClones(db, cloneColumn="clone_id",
                     sequenceColumn="sequence_alignment",
                     germlineColumn="germline_alignment_d_mask",
                     method="thresholdedFreq", minimumFrequency=0.6,
                     includeAmbiguous=FALSE, breakTiesStochastic=FALSE)
                     
# Calculate BASELINe
baseline <- calcBaseline(db, 
                         sequenceColumn="clonal_sequence",
                         germlineColumn="clonal_germline", 
                         testStatistic="focused",
                         regionDefinition=IMGT_V,
                         targetingModel=HH_S5F,
                         nproc = 1)

# Grouping the PDFs by the sample annotation
grouped <- groupBaseline(baseline, groupBy="sample_id")

# Get a data.frame of the summary statistics
stats <- summarizeBaseline(grouped, returnType="df")

[Package shazam version 1.2.0 Index]