efetch {reutils} | R Documentation |
efetch - downloading full records
Description
efetch
performs calls to the NCBI EFetch utility to retrieve data records
in the requested format for an NCBI Accession Number, one or more primary UIDs,
or for a set of UIDs stored in the user's web environment.
Usage
efetch(uid, db = NULL, rettype = NULL, retmode = NULL, outfile = NULL,
retstart = NULL, retmax = NULL, querykey = NULL, webenv = NULL,
strand = NULL, seqstart = NULL, seqstop = NULL, complexity = NULL)
Arguments
uid |
(Required) A list of UIDs provided either as a character vector, as an
|
db |
(Required if |
rettype |
A character string specifying the retrieval type, such as 'abstract' or 'medline' for PubMed, 'gp' or 'fasta' for Protein, or 'gb', or 'fasta' for Nuccore. See here for the available values for each database. |
retmode |
A character string specifying the data mode of the records returned, such as 'text' or 'xml'. See here for the available values for each database. |
outfile |
A character string naming a file for writing the data to.
Required if more than 500 UIDs are retrieved at once. In this case UIDs
have to be provided by reference to a Web Environment and a query key
obtained directly from previous calls to |
retstart |
Numeric index of the first record to be retrieved. |
retmax |
Total number of records from the input set to be retrieved. |
querykey |
An integer specifying which of the UID lists attached
to a user's Web Environment will be used as input to |
webenv |
A character string specifying the Web Environment that
contains the UID list. (Usually obtained directely from objects returned
by a previous call to |
strand |
Strand of DNA to retrieve. (1: plus strand, 2: minus strand) |
seqstart |
First sequence base to retrieve. |
seqstop |
Last sequence base to retrieve. |
complexity |
Data content to return. (0: entire data structure, 1: bioseq, 2: minimal bioseq-set, 3: minimal nuc-prot, 4: minimal pub-set) |
Details
See the official online documentation for NCBI's EUtilities for additional information.
See
here
for the default values for rettype
and retmode
, as well as a list of the available
databases for the EFetch utility.
Value
An efetch
object.
Note
If you are going to retrieve more than 500 UIDs at once, you will have to provide
the UIDs by reference to a Web Environment and a query key obtained from previous
calls to esearch
(if usehistory = TRUE
), epost
or elink
and you will have to specify an outfile
to
write the data to, rather than collecting the data into an R object.
See Also
content
, getUrl
, getError
,
database
, retmode
, rettype
.
Examples
## Not run:
## From Protein, retrieve a raw GenPept record and write it to a file.
p <- efetch("195055", "protein", "gp")
p
write(content(p, "text"), file = "~/AAD15290.gp")
## Get accessions for a list of GenBank IDs (GIs)
acc <- efetch(c("1621261", "89318838", "68536103", "20807972", "730439"),
"protein", rettype = "acc")
acc
acc <- strsplit(content(acc), "\n")[[1]]
acc
## Get GIs from a list of accession numbers
gi <- efetch(c("CAB02640.1", "EAS10332.1", "YP_250808.1", "NP_623143.1", "P41007.1"),
"protein", "uilist")
gi
## we can conveniently extract the UIDs using the eutil method #xmlValue(xpath)
gi$xmlValue("/IdList/Id")
## or we can extract the contents of the efetch query using the fuction content()
## and use the XML package to retrieve the UIDs
doc <- content(gi)
XML::xpathSApply(doc, "/IdList/Id", XML::xmlValue)
## Get the scientific name for an organism starting with the NCBI taxon id.
tx <- efetch("527031", "taxonomy")
tx
## Convenience accessor for XML nodes of interest using XPath
## Extract the TaxIds of the Lineage
tx["//LineageEx/Taxon/TaxId"]
## Use an XPath expession to extract the scientific name.
tx$xmlValue("/TaxaSet/Taxon/ScientificName")
## Iteratively retrieve a large number of records
# First store approx. 8400 UIDs on the History server.
uid <- esearch(term = "hexokinase", db = 'protein', usehistory = TRUE)
# Fetch the records and write to file in batches of 500.
efetch(uid, rettype = "fasta", retmode = "text", outfile = "~/tmp/hexokinases.fna")
## End(Not run)