get_GIs {disprose} | R Documentation |
Get GenInfo Identifier numbers
Description
Retrieves NCBI sequence identifiers (GIs) for given organism name or taxon identifier.
Usage
get_GIs(
org.name,
db,
n.start = 1,
n.stop = NULL,
step = 99999,
return.vector = TRUE,
check.result = FALSE,
term = NULL,
temp.dir = NULL,
delete.temp = FALSE,
verbose = TRUE
)
get_GIs_fix(
gis.list,
org.name,
db,
n.start = 1,
n.stop = NULL,
step = 99999,
term = NULL,
temp.dir = NULL,
delete.temp = FALSE,
verbose = TRUE
)
Arguments
org.name |
character; scientific name or taxon identifier (written as "txid0000") of the organism/taxon. |
db |
character; NCBI database for search. See entrez_dbs() for possible values. |
n.start |
integer; download starting value. Default is 1. |
n.stop |
integer; download finishing value. Default is NULL, which provides retrieval of all available GIs. |
step |
integer; download increment value. |
return.vector |
logical; whether to return GI numbers as character vector (another variant is list of vectors). |
check.result |
logical; check if download was done correctly. |
term |
character; search query. |
temp.dir |
character; name and path of directory for downloaded temporary files (only for "Windows" OS) |
delete.temp |
logical; delete downloaded files (only for "Windows" OS, does not delete directory). |
verbose |
logical; show messages |
gis.list |
list of previously downloaded GIs vectors. |
Details
This function sends the query to NCBI database and returns sequence identifiers according to the query. By default the
query is organism, so the function returns GI numbers for all sequences that are associated with the requested organism.
For example, if org.name = "Homo sapiens"
the function will download GI numbers for all sequences that answer the query
"Homo sapiens[Organism]". For any other query use parameter term
.
The function downloads GI numbers by piecemeal, by several pieces in one block. The size of the block is defined by parameter
step
. It is useful if by any reason the download was interrupted, so later it is possible to reload only
the missing blocks without the need to reload the entire amount of data. By default, all available GI numbers are downloaded,
but you may also choose start and finish notes by specifying the parameters n.start
and n.stop
. The numeration starts with 1, not 0.
At the end the resulting list of blocks (list of character vectors) is unlisted into one character vector. You may prevent this by setting
return.vector = FALSE
. Also, regardless of return.vector
settings, the list of blocks is returned if the download was somehow compromised.
If download was corrupted you may use get_GIs_fix()
function to reload the missing block. The corrupted list of blocks
should be set in gis.list
parameter. You may also check and reload data when get_GIs()
function is running
by specifying check.result = TRUE
.
The function checks for user's OS type. For Windows temporal files are created while downloading,
so temp.dir
and delete.temp
parameters should be set. This helps to solve the
"routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version" problem by using curl
instead of RCurl
.
However it slows down the function.If there is no temp.dir
directory, it will be
created and will not be removed (only temporal files will be deleted if delete.temp = TRUE
).
In progress the functions turn off and on scientific notation.
Value
get_GIs()
returns character vector of GI numbers. If return.vector = FALSE
or there are missing data,
list of character vectors is returned.
get_GIs_fix()
returns list of character vectors.
Functions
-
get_GIs
: Retrieves NCBI sequence identifiers (GIs) for given organism name or taxon identifier. -
get_GIs_fix
: Checks the downloads and tries to retrieve the compromised data.
Author(s)
Elena N. Filatova
Examples
gi.list<-get_GIs(org.name="txid9606", db="nucleotide",
n.start=1, n.stop=3, step=1,
return.vector = FALSE, check.result=TRUE,
temp.dir = tempdir(), delete.temp=TRUE)