R: Read input files into memory

read_input_files {ogrdbstats}

R Documentation

Read input files into memory

Description

Read input files into memory

Usage

read_input_files(
  ref_filename,
  inferred_filename,
  species,
  filename,
  chain,
  hap_gene,
  segment,
  chain_type,
  all_inferred
)

Arguments

`ref_filename`	Name of file containing IMGT-aligned reference genes in FASTA format
`inferred_filename`	Name of file containing sequences of inferred novel alleles, or '-' if none
`species`	Species name used in field 3 of the IMGT germline header with spaces omitted, if the reference file is from IMGT. Otherwise ”
`filename`	Name of file containing annotated reads in AIRR, CHANGEO or IgDiscover format. The format is detected automatically
`chain`	one of IGHV, IGKV, IGLV, IGHD, IGHJ, IGKJ, IGLJ, TRAV, TRAj, TRBV, TRBD, TRBJ, TRGV, TRGj, TRDV, TRDD, TRDJ
`hap_gene`	The haplotyping columns will be completed based on the usage of the two most frequent alleles of this gene. If NA, the column will be blank
`segment`	one of V, D, J
`chain_type`	one of H, L
`all_inferred`	Treat all alleles as novel

Value

A named list containing the following elements:

ref_genes	named list of IMGT-gapped reference genes
inferred_seqs	named list of IMGT-gapped inferred (novel) sequences.
input_sequences	data frame with one row per annotated read, with CHANGEO-style column names One key point: the column SEG_CALL is the gene call for the segment under analysis. Hence if segment is 'V', 'V_CALL' will be renamed 'SEG_CALL' whereas is segment is 'J', 'J_CALL' is renamed 'SEG_CALL'. This simplifies downstream processing. Rows in the input file with ambiguous SEG_CALLs, or no call, are removed.
genotype_db	named list of gene sequences referenced in the annotated reads (both reference and novel sequences)
haplo_details	data used for haplotype analysis, showing allelic ratios calculated with various potential haplotyping genes
genotype	data frame containing information provided in the OGRDB genotype csv file
calculated_NC	a boolean that is TRUE if mutation counts were calculated by this library, FALSE if they were read from the annotated read file

Examples

# Create the analysis data set from example files provided with the package
#(this dataset is also provided in the package as example_rep)
reference_set = system.file("extdata/ref_gapped.fasta", package = "ogrdbstats")
inferred_set = system.file("extdata/novel_gapped.fasta", package = "ogrdbstats")
repertoire = system.file("extdata/ogrdbstats_example_repertoire.tsv", package = "ogrdbstats")

example_data = read_input_files(reference_set, inferred_set, 'Homosapiens',
       repertoire, 'IGHV', NA, 'V', 'H', FALSE)

[Package ogrdbstats version 0.5.0 Index]