R: efetch - downloading full records

efetch {reutils}

R Documentation

efetch - downloading full records

Description

efetch performs calls to the NCBI EFetch utility to retrieve data records in the requested format for an NCBI Accession Number, one or more primary UIDs, or for a set of UIDs stored in the user's web environment.

Usage

efetch(uid, db = NULL, rettype = NULL, retmode = NULL, outfile = NULL,
  retstart = NULL, retmax = NULL, querykey = NULL, webenv = NULL,
  strand = NULL, seqstart = NULL, seqstop = NULL, complexity = NULL)

Arguments

`uid`	(Required) A list of UIDs provided either as a character vector, as an `esearch` object, or by reference to a Web Environment and a query key obtained directly from previous calls to `esearch` (if `usehistory = TRUE`), `epost` or `elink`. If UIDs are provided as a plain character vector, `db` must be specified explicitly, and all of the UIDs must be from the database specified by `db`.
`db`	(Required if `uid` is a character vector of UIDs) Database from which to retrieve records. See here for the supported databases.
`rettype`	A character string specifying the retrieval type, such as 'abstract' or 'medline' for PubMed, 'gp' or 'fasta' for Protein, or 'gb', or 'fasta' for Nuccore. See here for the available values for each database.
`retmode`	A character string specifying the data mode of the records returned, such as 'text' or 'xml'. See here for the available values for each database.
`outfile`	A character string naming a file for writing the data to. Required if more than 500 UIDs are retrieved at once. In this case UIDs have to be provided by reference to a Web Environment and a query key obtained directly from previous calls to `esearch` (if `usehistory = TRUE`), `epost` or `elink`.
`retstart`	Numeric index of the first record to be retrieved.
`retmax`	Total number of records from the input set to be retrieved.
`querykey`	An integer specifying which of the UID lists attached to a user's Web Environment will be used as input to `efetch`. (Usually obtained drectely from objects returned by a previous call to `esearch`, `epost` or `elink`.)
`webenv`	A character string specifying the Web Environment that contains the UID list. (Usually obtained directely from objects returned by a previous call to `esearch`, `epost` or `elink`.)
`strand`	Strand of DNA to retrieve. (1: plus strand, 2: minus strand)
`seqstart`	First sequence base to retrieve.
`seqstop`	Last sequence base to retrieve.
`complexity`	Data content to return. (0: entire data structure, 1: bioseq, 2: minimal bioseq-set, 3: minimal nuc-prot, 4: minimal pub-set)

Details

See the official online documentation for NCBI's EUtilities for additional information.

See here for the default values for rettype and retmode, as well as a list of the available databases for the EFetch utility.

Value

An efetch object.

Note

If you are going to retrieve more than 500 UIDs at once, you will have to provide the UIDs by reference to a Web Environment and a query key obtained from previous calls to esearch (if usehistory = TRUE), epost or elink and you will have to specify an outfile to write the data to, rather than collecting the data into an R object.

Examples

## Not run: 
## From Protein, retrieve a raw GenPept record and write it to a file.
p <- efetch("195055", "protein", "gp")
p

write(content(p, "text"), file = "~/AAD15290.gp")

## Get accessions for a list of GenBank IDs (GIs)
acc <- efetch(c("1621261", "89318838", "68536103", "20807972", "730439"),
              "protein", rettype = "acc")
acc
acc <- strsplit(content(acc), "\n")[[1]]
acc

## Get GIs from a list of accession numbers
gi <- efetch(c("CAB02640.1", "EAS10332.1", "YP_250808.1", "NP_623143.1", "P41007.1"),
             "protein", "uilist")
gi

## we can conveniently extract the UIDs using the eutil method #xmlValue(xpath)
gi$xmlValue("/IdList/Id")

## or we can extract the contents of the efetch query using the fuction content()
## and use the XML package to retrieve the UIDs
doc <- content(gi)
XML::xpathSApply(doc, "/IdList/Id", XML::xmlValue)

## Get the scientific name for an organism starting with the NCBI taxon id.
tx <- efetch("527031", "taxonomy")
tx
 
## Convenience accessor for XML nodes of interest using XPath
## Extract the TaxIds of the Lineage
tx["//LineageEx/Taxon/TaxId"]

## Use an XPath expession to extract the scientific name.
tx$xmlValue("/TaxaSet/Taxon/ScientificName")

## Iteratively retrieve a large number of records
# First store approx. 8400 UIDs on the History server.
uid <- esearch(term = "hexokinase", db = 'protein', usehistory = TRUE)
# Fetch the records and write to file in batches of 500.
efetch(uid, rettype = "fasta", retmode = "text", outfile = "~/tmp/hexokinases.fna")


## End(Not run)

[Package reutils version 0.2.3 Index]