entrez_summary {rentrez} | R Documentation |
Get summaries of objects in NCBI datasets from a unique ID
Description
The NCBI offer two distinct formats for summary documents.
Version 1.0 is a relatively limited summary of a database record based on a
shared Document Type Definition. Version 1.0 summaries are only available as
XML and are not available for some newer databases
Version 2.0 summaries generally contain more information about a given
record, but each database has its own distinct format. 2.0 summaries are
available for records in all databases and as JSON and XML files.
As of version 0.4, rentrez fetches version 2.0 summaries by default and
uses JSON as the exchange format (as JSON object can be more easily converted
into native R types). Existing scripts which relied on the structure and
naming of the "Version 1.0" summary files can be updated by setting the new
version
argument to "1.0".
Usage
entrez_summary(
db,
id = NULL,
web_history = NULL,
version = c("2.0", "1.0"),
always_return_list = FALSE,
retmode = NULL,
config = NULL,
...
)
Arguments
db |
character Name of the database to search for |
id |
vector with unique ID(s) for records in database |
web_history |
A web_history object |
version |
either 1.0 or 2.0 see above for description |
always_return_list |
logical, return a list of esummary objects even when only one ID is provided (see description for a note about this option) |
retmode |
either "xml" or "json". By default, xml will be used for version 1.0 records, json for version 2.0. |
config |
vector configuration options passed to |
... |
character Additional terms to add to the request, see NCBI documentation linked to in references for a complete list |
Details
By default, entrez_summary returns a single record when only one ID is
passed and a list of such records when multiple IDs are passed. This can lead
to unexpected behaviour when the results of a variable number of IDs (perhaps the
result of entrez_search
) are processed with an apply family function
or in a for-loop. If you use this function as part of a function or script that
generates a variably-sized vector of IDs setting always_return_list
to
TRUE
will avoid these problems. The function
extract_from_esummary
is provided for the specific case of extracting
named elements from a list of esummary objects, and is designed to work on
single objects as well as lists.
Value
A list of esummary records (if multiple IDs are passed and always_return_list if FALSE) or a single record.
file XMLInternalDocument xml file containing the entire record returned by the NCBI.
References
https://www.ncbi.nlm.nih.gov/books/NBK25499/#_chapter4_ESummary_
See Also
config
for available configs
extract_from_esummary
which can be used to extract
elements from a list of esummary records
Examples
## Not run:
pop_ids = c("307082412", "307075396", "307075338", "307075274")
pop_summ <- entrez_summary(db="popset", id=pop_ids)
extract_from_esummary(pop_summ, "title")
# clinvar example
res <- entrez_search(db = "clinvar", term = "BRCA1", retmax=10)
cv <- entrez_summary(db="clinvar", id=res$ids)
cv
extract_from_esummary(cv, "title", simplify=FALSE)
extract_from_esummary(cv, "trait_set")[1:2]
extract_from_esummary(cv, "gene_sort")
## End(Not run)