seq_BLAST {DBTC}R Documentation

BLAST Query File Against Local Database

Description

This function takes fasta files as input along with a user selected NCBI formatted database to BLAST sequences against. The outcome of the function are two files, a BLAST run file and a single file containing all of the BLAST results in tab delimited format (Note: there are no headers but the columns are, query sequence ID, search sequence ID, search taxonomic ID, query to sequence coverage, percent identity, search scientific name, search common name, query start, query end, search start, search end, e-value.

Usage

seq_BLAST(
  databasePath = NULL,
  querySeqPath = NULL,
  blastnPath = "blastn",
  minLen = 100,
  BLASTResults = 200,
  numCores = 1,
  verbose = TRUE
)

Arguments

databasePath

The location of a file in a directory where the desired BLAST database is located (Default NULL).

querySeqPath

The location of a file in a directory containing all of the fasta files wishing to be BLASTed (Default NULL).

blastnPath

The location of the NCBI blast+ blastn program (Default 'blastn').

minLen

The minimum length of the sequences that will be BLASTed (Default 100).

BLASTResults

The number of returned results, or the depth of the reported results, saved from the BLAST (Default 200).

numCores

The number of cores used to run the function (Default 1, Windows systems can only use a single core).

verbose

If set to TRUE then there will be output to the R console, if FALSE then this reporting data is suppressed (Default TRUE).

Details

The user input provides a location for the BLAST database you would like to use by selecting a file in the target directory. Then provide the location of the query sequence file(s) by indicating a file in a directory that contains the fasta file(s) of interest. Provide the path for the blast+ blastn program. Finally, provide the minimum query sequence length to BLAST (Default 100), the depth of the BLAST returned results (default 200), and finally the number of cores to process the function (Default 1, Windows implementation will only accept this value as 1).

The examples are present to display the syntax for the function. These examples are not run because there are files required to run the functions, in some cases multiple files are necessary and some of these are quite large. To get specific examples please see https://github.com/rgyoung6/DBTCShinyTutorial/blob/main/README.md

Value

Two files are produced from this function, a BLAST run file and a BLAST results file for each of the fasta files in the target directory.

Note

WARNING - NO WHITESPACE!

When running DBTC functions the paths for the files selected cannot have white space! File folder locations should be as short as possible (close to the root as some functions do not process long naming conventions.

Also, special characters should be avoided (including question mark, number sign, exclamation mark). It is recommended that dashes be used for separations in naming conventions while retaining underscores for use as information delimiters (this is how DBTC functions use underscore).

There are several key character strings used in the DBTC pipeline, the presence of these strings in file or folder names will cause errors when running DBTC functions.

The following strings are those used in DBTC and should not be used in file or folder naming: - _BLAST - _combinedDada - _taxaAssign - _taxaAssignCombined - _taxaReduced - _CombineTaxaReduced

Author(s)

Robert G. Young

References

<https://github.com/rgyoung6/DBTC> Young, R. G., Hanner, R. H. (Submitted October 2023). Dada-BLAST-Taxon Assign-Condense Shiny Application (DBTCShiny). Biodiversity Data Journal.

See Also

dada_implement() combine_dada_output() make_BLAST_DB() taxon_assign() combine_assign_output() reduce_taxa() combine_reduced_output()

Examples

## Not run: 
seq_BLAST()
seq_BLAST(databasePath = NULL, querySeqPath = NULL,  blastnPath = "blastn",
minLen = 100, BLASTResults = 200, numCores = 1)

## End(Not run)


[Package DBTC version 0.1.0 Index]