ncbi_searcher {traits} | R Documentation |
Search for gene sequences available for taxa from NCBI.
Description
Search for gene sequences available for taxa from NCBI.
Usage
ncbi_searcher(
taxa = NULL,
id = NULL,
seqrange = "1:3000",
getrelated = FALSE,
fuzzy = FALSE,
limit = 500,
entrez_query = NULL,
hypothetical = FALSE,
verbose = TRUE,
sleep = 0L
)
Arguments
taxa |
(character) Scientific name to search for. |
id |
( |
seqrange |
(character) Sequence range, as e.g., |
getrelated |
(logical) If |
fuzzy |
(logical) Whether to do fuzzy taxonomic ID search or exact
search. If |
limit |
( |
entrez_query |
( |
hypothetical |
( |
verbose |
(logical) If |
sleep |
(integer) number of seconds to sleep before each HTTP request. use if running to 429 Too Many Requests errors from NCBI. default: 0 (no sleep) |
Value
data.frame
of results if a single input is given. A list of
data.frame
s if multiple inputs are given.
Authentication
NCBI rate limits requests. If you set an API key you have a higher rate limit.
Set your API key like Sys.setenv(ENTREZ_KEY="yourkey")
or you can use
?rentrez::set_entrez_key
. set verbose curl output (crul::set_verbose()
) to
make sure your api key is being sent in the requests
Author(s)
Scott Chamberlain, Zachary Foster zacharyfoster1989@gmail.com
See Also
Examples
## Not run:
# A single species
out <- ncbi_searcher(taxa="Umbra limi", seqrange = "1:2000")
# Get the same species information using a taxonomy id
out <- ncbi_searcher(id = "75935", seqrange = "1:2000")
# If the taxon name is unique, using the taxon name and id are equivalent
all(ncbi_searcher(id = "75935") == ncbi_searcher(taxa="Umbra limi"))
# If the taxon name is not unique, use taxon id
# "266948" is the uid for the butterfly genus, but there is also a genus
# of orchids with the
# same name
nrow(ncbi_searcher(id = "266948")) == nrow(ncbi_searcher(taxa="Satyrium"))
# get list of genes available, removing non-unique
unique(out$gene_desc)
# does the string 'RAG1' exist in any of the gene names
out[grep("RAG1", out$gene_desc, ignore.case=TRUE),]
# A single species without records in NCBI
out <- ncbi_searcher(taxa="Sequoia wellingtonia", seqrange="1:2000",
getrelated=TRUE)
# Many species, can run in parallel or not using plyr
species <- c("Salvelinus alpinus","Ictalurus nebulosus","Carassius auratus")
out2 <- ncbi_searcher(taxa=species, seqrange = "1:2000")
lapply(out2, head)
library("plyr")
out2df <- ldply(out2) # make data.frame of all
unique(out2df$gene_desc) # get list of genes available, removing non-unique
out2df[grep("12S", out2df$gene_desc, ignore.case=TRUE), ]
# Using the getrelated and entrez_query options
ncbi_searcher(taxa = "Olpidiopsidales", limit = 5, getrelated = TRUE,
entrez_query = "18S[title] AND 28S[title]")
# get refseqs
one <- ncbi_searcher(taxa = "Salmonella enterica",
entrez_query="srcdb_refseq[PROP]")
two <- ncbi_searcher(taxa = "Salmonella enterica")
## End(Not run)