read_input_files {ogrdbstats} | R Documentation |
Read input files into memory
Description
Read input files into memory
Usage
read_input_files(
ref_filename,
inferred_filename,
species,
filename,
chain,
hap_gene,
segment,
chain_type,
all_inferred
)
Arguments
ref_filename |
Name of file containing IMGT-aligned reference genes in FASTA format |
inferred_filename |
Name of file containing sequences of inferred novel alleles, or '-' if none |
species |
Species name used in field 3 of the IMGT germline header with spaces omitted, if the reference file is from IMGT. Otherwise ” |
filename |
Name of file containing annotated reads in AIRR, CHANGEO or IgDiscover format. The format is detected automatically |
chain |
one of IGHV, IGKV, IGLV, IGHD, IGHJ, IGKJ, IGLJ, TRAV, TRAj, TRBV, TRBD, TRBJ, TRGV, TRGj, TRDV, TRDD, TRDJ |
hap_gene |
The haplotyping columns will be completed based on the usage of the two most frequent alleles of this gene. If NA, the column will be blank |
segment |
one of V, D, J |
chain_type |
one of H, L |
all_inferred |
Treat all alleles as novel |
Value
A named list containing the following elements:
ref_genes | named list of IMGT-gapped reference genes |
inferred_seqs | named list of IMGT-gapped inferred (novel) sequences. |
input_sequences | data frame with one row per annotated read, with CHANGEO-style column names One key point: the column SEG_CALL is the gene call for the segment under analysis. Hence if segment is 'V', 'V_CALL' will be renamed 'SEG_CALL' whereas is segment is 'J', 'J_CALL' is renamed 'SEG_CALL'. This simplifies downstream processing. Rows in the input file with ambiguous SEG_CALLs, or no call, are removed. |
genotype_db | named list of gene sequences referenced in the annotated reads (both reference and novel sequences) |
haplo_details | data used for haplotype analysis, showing allelic ratios calculated with various potential haplotyping genes |
genotype | data frame containing information provided in the OGRDB genotype csv file |
calculated_NC | a boolean that is TRUE if mutation counts were calculated by this library, FALSE if they were read from the annotated read file |
Examples
# Create the analysis data set from example files provided with the package
#(this dataset is also provided in the package as example_rep)
reference_set = system.file("extdata/ref_gapped.fasta", package = "ogrdbstats")
inferred_set = system.file("extdata/novel_gapped.fasta", package = "ogrdbstats")
repertoire = system.file("extdata/ogrdbstats_example_repertoire.tsv", package = "ogrdbstats")
example_data = read_input_files(reference_set, inferred_set, 'Homosapiens',
repertoire, 'IGHV', NA, 'V', 'H', FALSE)