db_create {restez} | R Documentation |
Create new NCBI database
Description
Create a new local SQL database from downloaded files. Currently only GenBank/nucleotide/nuccore database is supported.
Usage
db_create(
db_type = "nucleotide",
min_length = 0,
max_length = NULL,
acc_filter = NULL,
invert = FALSE,
alt_restez_path = NULL,
scan = FALSE
)
Arguments
db_type |
character, database type |
min_length |
Minimum sequence length, default 0. |
max_length |
Maximum sequence length, default NULL. |
acc_filter |
Character vector; accessions to include or exclude from
the database as specified by |
invert |
Logical vector of length 1; if TRUE, accessions in |
alt_restez_path |
Alternative restez path if you would like to use the downloads from a different restez path. |
scan |
Logical vector of length 1; should the sequence file be scanned
for accessions in |
Details
All .seq.gz files are added to the database by default. A user can specify
minimum/maximum sequence lengths or accession numbers to limit the sequences
to be added to the database – smaller databases are faster to search. The
final selection of sequences is the result of applying all filters
(acc_filter
, min_length
, max_length
) in combination.
The scan
option can decrease the time needed to build a database if only a
small number of sequences should be written to the database compared to the
number of the sequences downloaded from GenBank; i.e., if many of the files
downloaded from GenBank do not contain any sequences that should be written
to the database. When set to TRUE, if a file does not contain any of the
accessions in acc_filter
, further processing of that file will be skipped
and none of the sequences it contains will be added to the database.
Alternatively, a user can use the alt_restez_path
to add the files
from an alternative restez file path. For example, you may wish to have a
database of all environmental sequences but then an additional smaller one of
just the sequences with lengths below 100 bp. Instead of having to download
all environmental sequences twice, you can generate multiple restez databases
using the same downloaded files from a single restez path.
This function will not overwrite a pre-existing database. Old databases must
be deleted before a new one can be created. Use db_delete()
with
everything=FALSE to delete an SQL database.
Connections/disconnections to the database are made automatically.
See Also
Other database:
count_db_ids()
,
db_delete()
,
db_download()
,
demo_db_create()
,
is_in_db()
,
list_db_ids()
Examples
## Not run:
# Example of general usage
library(restez)
restez_path_set(filepath = 'path/for/downloads/and/database')
db_download()
db_create()
# Example of using `acc_filter`
#
# Download files to temporary directory
temp_dir <- paste0(tempdir(), "/restez", collapse = "")
dir.create(temp_dir)
restez_path_set(filepath = temp_dir)
# Choose GenBank domain 20 ('unannotated'), the smallest
db_download(preselection = 20)
# Only include three accessions in database
db_create(
acc_filter = c("AF000122", "AF000123", "AF000124")
)
list_db_ids()
db_delete()
unlink(temp_dir)
## End(Not run)