inferGenotypeBayesian {tigger} | R Documentation |
Infer a subject-specific genotype using a Bayesian approach
Description
inferGenotypeBayesian
infers an subject's genotype by applying a Bayesian framework
with a Dirichlet prior for the multinomial distribution. Up to four distinct alleles are
allowed in an individual’s genotype. Four likelihood distributions were generated by
empirically fitting three high coverage genotypes from three individuals
(Laserson and Vigneault et al, 2014). A posterior probability is calculated for the
four most common alleles. The certainty of the highest probability model was
calculated using a Bayes factor (the most likely model divided by second-most likely model).
The larger the Bayes factor (K), the greater the certainty in the model.
Usage
inferGenotypeBayesian(
data,
germline_db = NA,
novel = NA,
v_call = "v_call",
seq = "sequence_alignment",
find_unmutated = TRUE,
priors = c(0.6, 0.4, 0.4, 0.35, 0.25, 0.25, 0.25, 0.25, 0.25)
)
Arguments
data |
a |
germline_db |
named vector of sequences containing the
germline sequences named in |
novel |
an optional |
v_call |
column in |
seq |
name of the column in |
find_unmutated |
if |
priors |
a numeric vector of priors for the multinomial distribution.
The |
Details
Allele calls representing cases where multiple alleles have been
assigned to a single sample sequence are rare among unmutated
sequences but may result if nucleotides for certain positions are
not available. Calls containing multiple alleles are treated as
belonging to all groups. If novel
is provided, all
sequences that are assigned to the same starting allele as any
novel germline allele will have the novel germline allele appended
to their assignent prior to searching for unmutated sequences.
Value
A data.frame
of alleles denoting the genotype of the subject with the log10
of the likelihood of each model and the log10 of the Bayes factor. The output
contains the following columns:
-
gene
: The gene name without allele. -
alleles
: Comma separated list of alleles for the givengene
. -
counts
: Comma separated list of observed sequences for each corresponding allele in thealleles
list. -
total
: The total count of observed sequences for the givengene
. -
note
: Any comments on the inferrence. -
kh
: log10 likelihood that thegene
is homozygous. -
kd
: log10 likelihood that thegene
is heterozygous. -
kt
: log10 likelihood that thegene
is trizygous -
kq
: log10 likelihood that thegene
is quadrozygous. -
k_diff
: log10 ratio of the highest to second-highest zygosity likelihoods.
Note
This method works best with data derived from blood, where a large portion of sequences are expected to be unmutated. Ideally, there should be hundreds of allele calls per gene in the input.
References
Laserson U and Vigneault F, et al. High-resolution antibody dynamics of vaccine-induced immune responses. PNAS. 2014 111(13):4928-33.
See Also
plotGenotype for a colorful visualization and genotypeFasta to convert the genotype to nucleotide sequences. See inferGenotype to infer a subject-specific genotype using a frequency method
Examples
# Infer IGHV genotype, using only unmutated sequences, including novel alleles
inferGenotypeBayesian(AIRRDb, germline_db=SampleGermlineIGHV, novel=SampleNovel,
find_unmutated=TRUE, v_call="v_call", seq="sequence_alignment")