reutils-package {reutils} | R Documentation |
Talk to the NCBI EUtils
Description
An interface to NCBI databases such as PubMed, GenBank, or GEO powered by the Entrez Programming Utilities (EUtils). The nine EUtils provide programmatic access to the NCBI Entrez query and database system for searching and retrieving biological data.
Details
With nine Entrez Progamming Utilities, NCBI provides a programmatical interface to the Entrez query and database system for searching and retrieving requested data
Each of these tools corresponds to an R
function in the reutils
package described below.
The output returned by the EUtils is typically in XML format. To gain access to this output you have several options:
Use the
content(as = "xml")
method to extract the output as anXMLInternalDocument
object and process it further using the facilities provided by theXML
package.Use the
content(as = "parsed")
method to extract the output intodata.frame
s. Note that this is currently only implemented for docsums returned byesummary
, uilists returned byesearch
, and the output returned byeinfo
.Access specific nodes in the XML tree using XPath expressions with the reference class methods
#xmlValue
,#xmlAttr
, or#xmlName
built intoeutil
objects.
The Entrez Programming Utilities can also generate output in other formats,
such as plain-text Fasta or GenBank files for sequence databases,
or the MedLine format for the literature database. The type of output is
generally controlled by setting the retmode
and rettype
arguments
when calling a EUtil.
Please check the relevant
usage guidelines
when using these services. Note that Entrez server requests are subject to
frequency limits.
Main functions
-
esearch
: Search and retrieve primary UIDs for use withesummary
,elink
, orefetch
.esearch
additionally returns term translations and optionally stores results for future use in the user's Web Environment. -
esummary
: Retrieve document summaries from a list of primary UIDs (Provided as a character vector or as anesearch
object). -
egquery
: Provides Entrez database counts in XML for a single search term using a Global Query. -
einfo
: Retrieve field names, term counts, last update, and available updates for each database. -
efetch
: Retrieve data records in a specified format corresponding to a list of primary UIDs or from the user's Web Environment in the Entrez History server. -
elink
: Returns a list of UIDs (and relevancy scores) from a target database that are related to a list of UIDs in the same database or in another Entrez database. -
epost
: Uploads primary UIDs to the users's Web Environment on the Entrez history server for subsequent use withesummary
,elink
, orefetch
. -
espell
: Provide spelling suggestions. -
ecitmatch
: Retrieves PubMed IDs (PMIDs) that correspond to a set of input citation strings -
content
: Extract the content of a request from theeutil
object returned by any of the above functions.
Package options
reutils uses three options
to configure behaviour:
-
reutils.email
: NCBI requires that a user of their API provides an email address with a call to Entrez. If you are going to perform a lot of queries consider settingreutils.email
to your email address in your .Rprofile file. -
reutils.show.headlines
: By defaultefetch
objects containing text data show only the first 12 lines. This is quite handy if you have downloaded a fairly large genome in Genbank file format. This can be changed by setting the global optionreutils.show.headlines
to another numeric value orNULL
. -
reutils.verbose.queries
: If you perform many queries interactively you might want to get messages announcing the queries you run. You can do so by setting the optionreutils.verbose.queries
toTRUE
. -
reutils.test.remote
: Unit tests that require online access to NCBI services are disabled by default, as they cannot be garanteed to be available/working under all circumstances. Set the option codereutils.test.remote toTRUE
to run the full suite of tests.
Author(s)
Gerhard Schöfl gerhard.schofl@gmail.com
Examples
#
# combine esearch and efetch
#
# Download PubMed records that are indexed in MeSH for both 'Chlamydia' and
# 'genome' and were published in 2013.
query <- "Chlamydia[mesh] and genome[mesh] and 2013[pdat]"
# Upload the PMIDs for this search to the History server
pmids <- esearch(query, "pubmed", usehistory = TRUE)
pmids
## Not run:
# Fetch the records
articles <- efetch(pmids)
# Use XPath expressions with the #xmlValue() or #xmlAttr() methods to directly
# extract specific data from the XML records stored in the 'efetch' object.
titles <- articles$xmlValue("//ArticleTitle")
abstracts <- articles$xmlValue("//AbstractText")
#
# combine epost with esummary/efetch
#
# Download protein records corresponding to a list of GI numbers.
uid <- c("194680922", "50978626", "28558982", "9507199", "6678417")
# post the GI numbers to the Entrez history server
p <- epost(uid, "protein")
# retrieve docsums with esummary
docsum <- content(esummary(p, version = "1.0"), "parsed")
docsum
# download FASTAs as 'text' with efetch
prot <- efetch(p, retmode = "text", rettype = "fasta")
prot
# retrieve the content from the efetch object
fasta <- content(prot)
## End(Not run)