bold_seq {bold}R Documentation

Search BOLD for sequences.

Description

Get sequences for a taxonomic name, id, bin, container, institution, researcher, geographic, place, or gene.

Usage

bold_seq(
  taxon = NULL,
  ids = NULL,
  bin = NULL,
  container = NULL,
  institutions = NULL,
  researchers = NULL,
  geo = NULL,
  marker = NULL,
  response = FALSE,
  ...
)

Arguments

taxon

(character) Returns all records containing matching taxa. Taxa includes the ranks of phylum, class, order, family, subfamily, genus, and species.

ids

(character) Returns all records containing matching IDs. IDs include Sample IDs, Process IDs, Museum IDs and Field IDs.

bin

(character) Returns all records contained in matching BINs. A BIN is defined by a Barcode Index Number URI.

container

(character) Returns all records contained in matching projects or datasets. Containers include project codes and dataset codes

institutions

(character) Returns all records stored in matching institutions. Institutions are the Specimen Storing Site.

researchers

(character) Returns all records containing matching researcher names. Researchers include collectors and specimen identifiers.

geo

(character) Returns all records collected in matching geographic sites. Geographic sites includes countries and province/states.

marker

(character) Returns all records containing matching marker codes.

response

(logical) Note that response is the object that returns from the Curl call, useful for debugging, and getting detailed info on the API call.

...

Further args passed on to crul::verb-GET, main purpose being curl debugging

Value

A list with each element of length 4 with slots for id, name, gene, and sequence.

Large requests

Some requests can lead to errors. These often have to do with requesting data for a rank that is quite high in the tree, such as an Order, for example, Coleoptera. If your request is taking a long time, it's likely that something will go wrong on the BOLD server side, or we'll not be able to parse the result here in R because R can only process strings of a certain length. bold users have reported errors in which the resulting response from BOLD is so large that we could not parse it.

A good strategy for when you want data for a high rank is to do many separate requests for lower ranks within your target rank. You can do this manually, or use the function taxize::downstream to get all the names of a lower rank within a target rank. There's an example in the README (https://docs.ropensci.org/bold/#large-data)

If a request times out

This is likely because you're request was for a large number of sequences and the BOLD service timed out. You still should get some output, those sequences that were retrieved before the time out happened. As above, see the README (https://docs.ropensci.org/bold/#large-data) for an example of dealing with large data problems with this function.

Marker

Notes from BOLD on the marker param: "All markers for a specimen matching the search string will be returned. ie. A record with COI-5P and ITS will return sequence data for both markers even if only COI-5P was specified."

You will likely end up with data with markers that you did not request - just be sure to filter those out as needed.

References

http://v4.boldsystems.org/index.php/resources/api?type=webservices

Examples

## Not run: 
res <- bold_seq(taxon='Coelioxys')
bold_seq(taxon='Aglae')
bold_seq(taxon=c('Coelioxys','Osmia'))
bold_seq(ids='ACRJP618-11')
bold_seq(ids=c('ACRJP618-11','ACRJP619-11'))
bold_seq(bin='BOLD:AAA5125')
bold_seq(container='ACRJP')
bold_seq(researchers='Thibaud Decaens')
bold_seq(geo='Ireland')
bold_seq(geo=c('Ireland','Denmark'))

# Return the http response object for detailed Curl call response details
res <- bold_seq(taxon='Coelioxys', response=TRUE)
res$url
res$status_code
res$response_headers

## curl debugging
### You can do many things, including get verbose output on the curl 
### call, and set a timeout
bold_seq(taxon='Coelioxys', verbose = TRUE)[1:2]
# bold_seqspec(taxon='Coelioxys', timeout_ms = 10)

## End(Not run)

[Package bold version 1.2.0 Index]