findNovelAlleles {tigger}R Documentation

Find novel alleles from repertoire sequencing data

Description

findNovelAlleles analyzes mutation patterns in sequences thought to align to each germline allele in order to determine which positions might be polymorphic.

Usage

findNovelAlleles(
  data,
  germline_db,
  v_call = "v_call",
  j_call = "j_call",
  seq = "sequence_alignment",
  junction = "junction",
  junction_length = "junction_length",
  germline_min = 200,
  min_seqs = 50,
  auto_mutrange = TRUE,
  mut_range = 1:10,
  pos_range = 1:312,
  pos_range_max = NULL,
  y_intercept = 0.125,
  alpha = 0.05,
  j_max = 0.15,
  min_frac = 0.75,
  nproc = 1
)

Arguments

data

data.frame containing repertoire data. See details.

germline_db

vector of named nucleotide germline sequences matching the V calls in data. These should be the gapped reference germlines used to make the V calls.

v_call

name of the column in data with V allele calls. Default is v_call.

j_call

name of the column in data with J allele calls. Default is j_call.

seq

name of the column in data with the aligned, IMGT-numbered, V(D)J nucleotide sequence. Default is sequence_alignment.

junction

Junction region nucleotide sequence, which includes the CDR3 and the two flanking conserved codons. Default is junction.

junction_length

Number of junction nucleotides in the junction sequence. Default is junction_length.

germline_min

the minimum number of sequences that must have a particular germline allele call for the allele to be analyzed.

min_seqs

minimum number of total sequences (within the desired mutational range and nucleotide range) required for the samples to be considered.

auto_mutrange

if TRUE, the algorithm will attempt to determine the appropriate mutation range automatically using the mutation count of the most common sequence assigned to each allele analyzed.

mut_range

range of mutations that samples may carry and be considered by the algorithm.

pos_range

range of IMGT-numbered positions that should be considered by the algorithm.

pos_range_max

Name of the column in data with the ending positions of the V alignment in the germline (usually v_germline_end). The end of the alignment will be used to limit the range of positions to be considered to count mutations. With NULL all positions in the IMGT V region will be considered. In this case, in sequences where the V was trimmed on the 3', mutated nucleotides could include nucleotides from the CDR3.

y_intercept

y-intercept threshold above which positions should be considered potentially polymorphic.

alpha

alpha value used for determining whether the fit y-intercept is greater than the y_intercept threshold.

j_max

maximum fraction of sequences perfectly aligning to a potential novel allele that are allowed to utilize to a particular combination of junction length and J gene. The closer to 1, the less strict the filter for junction length and J gene diversity will be.

min_frac

minimum fraction of sequences that must have usable nucleotides in a given position for that position to considered.

nproc

number of processors to use.

Details

The TIgGER allele-finding algorithm, briefly, works as follows: Mutations are determined through comparison to the provided germline. Mutation frequency at each *position* is determined as a function of *sequence-wide* mutation counts. Polymorphic positions exhibit a high mutation frequency despite sequence-wide mutation count. False positive of potential novel alleles resulting from clonally-related sequences are guarded against by ensuring that sequences perfectly matching the potential novel allele utilize a wide range of combinations of J gene and junction length.

Value

A data.frame with a row for each known allele analyzed. Besides metadata on the the parameters used in the search, each row will have either a note as to where the polymorphism-finding algorithm exited or a nucleotide sequence for the predicted novel allele, along with columns providing additional evidence.

The output contains the following columns:

The following comments can appear in the note column:

See Also

selectNovel to filter the results to show only novel alleles. plotNovel to visualize the data supporting any novel alleles hypothesized to be present in the data and inferGenotype and inferGenotypeBayesian to determine if the novel alleles are frequent enought to be included in the subject's genotype.

Examples


# Note: In this example, with SampleGermlineIGHV,
# which contains reference germlines retrieved on August 2014,
# TIgGER finds the allele IGHV1-8*02_G234T. This allele
# was added to IMGT as IGHV1-8*03 on March 28, 2018.

# Find novel alleles and return relevant data
novel <- findNovelAlleles(AIRRDb, SampleGermlineIGHV)
selectNovel(novel)



[Package tigger version 1.1.0 Index]