create_ref {MixviR} | R Documentation |
Create MixVir-formatted reference genome object
Description
Uses a fasta genome and bed file defining features of interest (genes/ORFs) to create a data frame that's used as a reference to translate nucleotide data to amino acids and subsequently call variants/mutations from a sample.
Usage
create_ref(genome, feature.bed, code.num = "1", removed.genes = NULL)
Arguments
genome |
(Required) Path to fasta formatted genome file |
feature.bed |
(Required) Path to bed file defining features of interest (open reading frames to translate). Tab delimited with 6 columns (without column names):"chr", "start", "end", "feature_name", "score" (not used), and "strand". |
code.num |
Number (character) associated with the genetic code to be used for translation. Details can be found at https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi. |
removed.genes |
Character providing path/name of tab-separated file that will be written that stores names of genes (if any) in the feature.bed file that were removed because they didn't have an allowed size (not even multiples of 3). If NULL (default), file is not written. |
Value
A data frame with columns CHR,POS,REF_BASE,GENE,STRAND,REF_CODON,REF_AA,GENE_AA_POS,REF_IDENT,GENE_BASE_NUM,CODON_POSITION
Examples
site1 <- "https://raw.githubusercontent.com/mikesovic/MixviR/main/raw_files/GCF_ASM985889v3.fa"
site2 <- "https://raw.githubusercontent.com/mikesovic/MixviR/main/raw_files/sars_cov2_genes.bed"
if (httr::http_error(site1) | httr::http_error(site2)) {
message("No internet connection or data source broken.")
return(NULL)
} else {
create_ref(
genome = site1,
feature.bed = site2,
code.num = "1")
}