format_reference_database {LocaTT}R Documentation

Format Reference Databases

Description

Formats reference databases from MIDORI or UNITE for use with the local_taxa_tool function.

Usage

format_reference_database(
  path_to_input_reference_database,
  path_to_output_BLAST_database,
  input_reference_database_source = "MIDORI",
  path_to_taxonomy_edits = NA,
  path_to_sequence_edits = NA,
  path_to_list_of_local_taxa_to_subset = NA,
  makeblastdb_command = "makeblastdb"
)

Arguments

path_to_input_reference_database

String specifying path to input reference database in FASTA format.

path_to_output_BLAST_database

String specifying path to output BLAST database in FASTA format. File path cannot contain spaces.

input_reference_database_source

String specifying input reference database source ('MIDORI' or 'UNITE'). The default is 'MIDORI'.

path_to_taxonomy_edits

String specifying path to taxonomy edits file in CSV format. The file must contain the following fields: 'Old_Taxonomy', 'New_Taxonomy', 'Notes'. Old taxonomies are replaced with new taxonomies in the order the records appear in the file. The taxonomic levels in the 'Old_Taxonomy' and 'New_Taxonomy' fields should be delimited by a semi-colon. If no taxonomy edits are desired, then set this variable to NA (the default).

path_to_sequence_edits

String specifying path to sequence edits file in CSV format. The file must contain the following fields: 'Action', 'Common_Name', 'Domain', 'Phylum', 'Class', 'Order', 'Family', 'Genus', 'Species', 'Sequence', 'Notes'. The values in the 'Action' field must be either 'Add' or 'Remove', which will add or remove the respective sequence from the reference database. Values in the 'Common_Name' field are optional. Values should be supplied to all taxonomy fields. If using a reference database from MIDORI, then use NCBI superkingdom names (e.g., 'Eukaryota') in the 'Domain' field. If using a reference database from UNITE, then use kingdom names (e.g., 'Fungi') in the 'Domain' field. The 'Species' field should contain species binomials. Sequence edits are performed after taxonomy edits, if applied. If no sequence edits are desired, then set this variable to NA (the default).

path_to_list_of_local_taxa_to_subset

String specifying path to list of species (in CSV format) to subset the reference database to. This option is helpful if the user wants the reference database to include only the sequences of local species. The file should contain the following fields: 'Common_Name', 'Domain', 'Phylum', 'Class', 'Order', 'Family', 'Genus', 'Species'. There should be no 'NA's or blanks in the taxonomy fields. The species field should contain the binomial name without subspecies or other information below the species level. There should be no duplicate species (i.e., multiple records with the same species binomial and taxonomy) in the species list. Subsetting the reference database to the sequences of certain species is performed after taxonomy and sequence edits are applied to the reference database, and species must match at all taxonomic levels in order to be retained in the reference database. If subsetting the reference database to the sequences of certain species is not desired, set this variable to NA (the default).

makeblastdb_command

String specifying path to the makeblastdb program, which is a part of BLAST. The default ('makeblastdb') should work for standard BLAST installations. The user can provide a path to the makeblastdb program for non-standard BLAST installations.

Value

No return value. Writes formatted BLAST database files.

Examples


# Get path to example reference sequences FASTA file.
path_to_input_file<-system.file("extdata",
                                "example_reference_sequences.fasta",
                                 package="LocaTT",
                                 mustWork=TRUE)

# Create a temporary file path for the output reference database FASTA file.
path_to_output_file<-tempfile(fileext=".fasta")

# Format reference database.
format_reference_database(path_to_input_reference_database=path_to_input_file,
                          path_to_output_BLAST_database=path_to_output_file)


[Package LocaTT version 1.1.1 Index]