get_seq_info {disprose} | R Documentation |
Get NCBI sequence record
Description
Retrieves information about sequences from NCBI records for given organism name or taxon identifier.
Usage
get_seq_info(
org.name,
db,
n.start = 1,
n.stop = NULL,
step = 500,
return.dataframe = FALSE,
check.result = FALSE,
term = NULL,
verbose = TRUE
)
get_seq_info_fix(
info.list,
web.history = NULL,
org.name = NULL,
db,
n.start = 1,
n.stop = NULL,
step = 500,
term = NULL,
verbose = TRUE
)
info_listtodata(info.list, unlist = TRUE, verbose = TRUE)
Arguments
org.name |
character; scientific name or taxon identifier (written as "txid0000") of the organism/taxon. |
db |
character; NCBI database for search. See entrez_dbs() for possible values. |
n.start |
integer; download starting value. Default is 1. |
n.stop |
integer; download finishing value. Default is NULL, which provides retrieval of all available GIs. |
step |
integer; download increment value. Maximum is 500. |
return.dataframe |
integer; whether to return information as structured data frame (another variant is list of lists). |
check.result |
logical; check if download was done correctly. |
term |
character; search query. |
verbose |
logical; show messages |
info.list |
list of previously downloaded records. |
web.history |
previously saved web_history object for use in calls to the NCBI. New web.history is created if none is provided. |
unlist |
logical; unlist result before transforming (only recommended if |
Details
This function sends the query to NCBI database and returns sequence records according to the query. By default the
query is organism, so the function returns data of all sequences that are associated with the requested organism.
For example, if org.name = "Homo sapiens"
the function will download data for all records that answer the query
"Homo sapiens[Organism]". For any other query use parameter term
.
The function downloads records by piecemeal, by several pieces in one block. The size of the block is defined by parameter
step
. It is useful if by any reason the download was interrupted, so later it is possible to reload only
the missing blocks without the need to reload the entire amount of data. By default, all available records are downloaded,
but you may also choose start and finish points by specifying the parameters n.start
and n.stop
. The numeration starts with 1, not 0.
At the end the resulting list of blocks (list of lists if step > 1
) is unlisted into one data frame that contains information about record GI, UID,
caption, source database, organism, strain etc. You may prevent this by setting return.dataframe = FALSE
.
Also, regardless of return.dataframe
settings, the list of blocks is returned if the download was somehow compromised.
Optionally, you can turn the resulting list into data frame later using the function info_listtodata()
.
Note that in this case, if parameter info.list
was inherited from get_seq_info()
function,
the result must be unlisted first (use unlist = TRUE
).
If download was corrupted you may use get_seq_info()
function to reload the missing block. The corrupted list of blocks
should be set in info.list
parameter. You may also check and reload data when get_seq_infos()
function is running
by specifying check.result = TRUE
.
In progress the functions turn off and on scientific notation.
Value
get_seq_info()
returns data frame that contains most of sequence information from NCBI records.
If return.dataframer = FALSE
or there are missing data, list of lists is returned. List contains full information
from NCBI records.
get_seq_info_fix()
returns list of lists.
info_listtodata()
returns data frame.
Functions
-
get_seq_info
: Retrieves NCBI sequence records for given organism name or taxon identifier. -
get_seq_info_fix
: Checks the downloads and tries to retrieve the compromised data. -
info_listtodata
: Transforms downloaded list into data frame.
Author(s)
Elena N. Filatova
Examples
info.dataframe <- get_seq_info (org.name = "txid9606", db = "nucleotide", n.start = 1,
n.stop = 10, step = 5, return.dataframe = TRUE,
check.result = TRUE)