gb_df_generate {restez} | R Documentation |
Generate GenBank records data.frame
Description
For a list of records, construct a data.frame for insertion into SQL database.
Usage
gb_df_generate(
records,
min_length = 0,
max_length = NULL,
acc_filter = NULL,
invert = FALSE
)
Arguments
records |
character, vector of GenBank records in text format |
min_length |
Minimum sequence length, default 0. |
max_length |
Maximum sequence length, default NULL. |
acc_filter |
Character vector; accessions to include or exclude from
the database as specified by |
invert |
Logical vector of length 1; if TRUE, accessions in |
Details
The resulting data.frame has five columns: accession, organism, raw_definition, raw_sequence, raw_record. The prefix 'raw_' indicates the data has been converted to the raw format, see ?charToRaw, in order to save on RAM. The raw_record contains the entire GenBank record in text format.
Use acc_filter
and max and min sequence lengths to minimize the size of the
database. All sequences have to be at least as long as min and less than or
equal in length to max, unless max is NULL in which there is no maximum
length. The final selection of sequences is the result of applying all
filters (acc_filter
, min_length
, max_length
) in combination.
Value
data.frame, or NULL if no records pass filters
See Also
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()