read_input_files {ogrdbstats}R Documentation

Read input files into memory

Description

Read input files into memory

Usage

read_input_files(
  ref_filename,
  inferred_filename,
  species,
  filename,
  chain,
  hap_gene,
  segment,
  chain_type,
  all_inferred
)

Arguments

ref_filename

Name of file containing IMGT-aligned reference genes in FASTA format

inferred_filename

Name of file containing sequences of inferred novel alleles, or '-' if none

species

Species name used in field 3 of the IMGT germline header with spaces omitted, if the reference file is from IMGT. Otherwise ”

filename

Name of file containing annotated reads in AIRR, CHANGEO or IgDiscover format. The format is detected automatically

chain

one of IGHV, IGKV, IGLV, IGHD, IGHJ, IGKJ, IGLJ, TRAV, TRAj, TRBV, TRBD, TRBJ, TRGV, TRGj, TRDV, TRDD, TRDJ

hap_gene

The haplotyping columns will be completed based on the usage of the two most frequent alleles of this gene. If NA, the column will be blank

segment

one of V, D, J

chain_type

one of H, L

all_inferred

Treat all alleles as novel

Value

A named list containing the following elements:

ref_genes named list of IMGT-gapped reference genes
inferred_seqs named list of IMGT-gapped inferred (novel) sequences.
input_sequences data frame with one row per annotated read, with CHANGEO-style column names One key point: the column SEG_CALL is the gene call for the segment under analysis. Hence if segment is 'V', 'V_CALL' will be renamed 'SEG_CALL' whereas is segment is 'J', 'J_CALL' is renamed 'SEG_CALL'. This simplifies downstream processing. Rows in the input file with ambiguous SEG_CALLs, or no call, are removed.
genotype_db named list of gene sequences referenced in the annotated reads (both reference and novel sequences)
haplo_details data used for haplotype analysis, showing allelic ratios calculated with various potential haplotyping genes
genotype data frame containing information provided in the OGRDB genotype csv file
calculated_NC a boolean that is TRUE if mutation counts were calculated by this library, FALSE if they were read from the annotated read file

Examples

# Create the analysis data set from example files provided with the package
#(this dataset is also provided in the package as example_rep)
reference_set = system.file("extdata/ref_gapped.fasta", package = "ogrdbstats")
inferred_set = system.file("extdata/novel_gapped.fasta", package = "ogrdbstats")
repertoire = system.file("extdata/ogrdbstats_example_repertoire.tsv", package = "ogrdbstats")

example_data = read_input_files(reference_set, inferred_set, 'Homosapiens',
       repertoire, 'IGHV', NA, 'V', 'H', FALSE)

[Package ogrdbstats version 0.5.0 Index]