searchGB {insect} | R Documentation |
Query the NCBI GenBank database.
Description
searchGB
queries GenBank using the
Entrez search utilities, and downloads the matching sequences
and/or their accession numbers. A vector of
accession numbers can be passed in lieu of a query, in which case the function
downloads the matching sequences from GenBank.
Internet connectivity is required.
Usage
searchGB(
query = NULL,
accession = NULL,
sequences = TRUE,
bin = TRUE,
db = "nucleotide",
taxIDs = TRUE,
prompt = TRUE,
contact = NULL,
quiet = FALSE
)
Arguments
query |
an Entrez search query. For help compiling Entrez queries see https://www.ncbi.nlm.nih.gov/books/NBK3837/#EntrezHelp.Entrez_Searching_Options and https://www.ncbi.nlm.nih.gov/books/NBK49540/. |
accession |
an optional vector of GenBank accession numbers to be input in place of a search query. If both query and accession arguments are provided the function returns an error. Currently, a maximum of 200 accession numbers can be processed at a time. |
sequences |
logical. Should the sequences be returned or only the
GenBank accession numbers? Note that taxon IDs
are not returned if |
bin |
logical indicating whether the returned sequences should be in raw-byte format ("DNAbin" or "AAbin" object type) or as a vector of named character strings. Defaults to TRUE. |
db |
the NCBI database from which to download the sequences and/or accession names. Accepted options are "nucleotide" (default) and "protein". |
taxIDs |
logical indicating whether the NCBI taxon ID numbers should be appended to the names of the output object (delimited by a "|" character). Defaults to TRUE. |
prompt |
logical indicating whether to check with the user before downloading sequences. |
contact |
an optional character string with the users email address. This is added to the E-utilities URL and may be used by NCBI to contact the user if the application causes unintended issues. |
quiet |
logical indicating whether the progress should be printed to the console. |
Details
This function uses the Entrez e-utilities API to search and download sequences from GenBank. Occasionally users may encounter an unknown non-reproducible error and appears to be related to database records being updated in GenBank. This can generally be remedied by re-running the function. If problems persist please feel free to raise an issue on the package bug-reports page at <https://github.com/shaunpwilkinson/insect/issues/>.
Value
a list of sequences as either a DNAbin
or AAbin
object (depending on "db"
),
or a named vector of character strings (if bin = FALSE
).
Author(s)
Shaun Wilkinson
References
NCBI Resource Coordinators (2012) Database resources of the National Center for Biotechnology Information. Nucleic Acids Research, 41 (Database issue): D8–D20.
See Also
read.GenBank
(ape)
for an alternative means of downloading DNA sequences from GenBank
using accession numbers.
Examples
## Query the GenBank database for Eukaryote mitochondrial 16S DNA sequences
## between 100 and 300 base pairs in length that were modified between
## the years 1999 and 2000.
query <- "Eukaryota[ORGN]+AND+16S[TITL]+AND+100:300[SLEN]+AND+1999:2000[MDAT]"
x <- searchGB(query, prompt = FALSE)