listGenomes {biomartr} | R Documentation |
List All Available Genomes either by kingdom, group, or subgroup
Description
This function retrieves the names of all genomes available on the NCBI ftp:// server and stores the results in a file named 'overview.txt' inside the directory _ncbi_downloads' that is built inside the workspace.
Usage
listGenomes(
db = "refseq",
type = "all",
subset = NULL,
details = FALSE,
update = FALSE,
skip_bacteria = FALSE
)
Arguments
db |
a character string specifying the database for which genome availability shall be checked. Available options are:
|
type |
a character string specifying a potential filter of available genomes. Available options are:
|
subset |
a character string or character vector specifying a subset of
|
details |
a boolean value specifying whether only the scientific names of stored genomes shall be returned (details = FALSE) or all information such as
|
update |
logical, default FALSE. If TRUE, update cached list,
if FALSE use existing cache (if it exists). For cache location see
|
skip_bacteria |
Due to its enormous dataset size (> 700MB as of July 2023),
the bacterial summary file will not be loaded by default anymore. If users
wish to gain insights for the bacterial kingdom they needs to actively specify |
Details
Internally this function loads the the overview.txt file from NCBI
and creates a directory '_ncbi_downloads' in the temdir()
folder to store the overview.txt file for future processing. In case the
overview.txt file already exists within the '_ncbi_downloads' folder and is
accessible within the workspace, no download process will be performed again.
Note
Please note that the ftp:// connection relies on the NCBI or ENSEMBL server and cannot be accurately accessed via a proxy.
Author(s)
Hajk-Georg Drost
Examples
## Not run:
# print details for refseq
listGenomes(db = "refseq")
# print details for all plants in refseq
listGenomes(db = "refseq", type = "kingdom")
# print details for all plant groups in refseq
listGenomes(db = "refseq", type = "group")
# print details for all plant subgroups in refseq
listGenomes(db = "refseq", type = "subgroup")
# Ensembl
listGenomes(db = "ensembl", type = "kingdom", subset = "EnsemblVertebrates")
## End(Not run)