auto_seq_download {MACER} | R Documentation |
Automatic Sequence Download
Description
Takes a list of genera, as supplied by the user, and searches and downloads molecular sequence data from BOLD and Genbank.
Usage
auto_seq_download(
BOLD_database = TRUE,
NCBI_database = TRUE,
search_str = NULL,
input_file = NULL,
output_file = NULL,
seq_min = 100,
seq_max = 2500
)
Arguments
BOLD_database |
TRUE is to include, FALSE is to exclude; default TRUE |
NCBI_database |
TRUE is to include, FALSE is to exclude; default TRUE |
search_str |
NULL uses the default string, anything other than NULL then that string will be used for the GenBank search; default NULL. The Default String is: (genus[ORGN]) NOT (shotgun[ALL] OR genome[ALL] OR assembled[ALL] OR microsatellite[ALL]) |
input_file |
NULL prompts the user to indicate the location of the input file through point and click prompts, anything other than NULL then the string supplied will be used for the location; default NULL |
output_file |
NULL prompts the user to indicate the location of the output file through point and click prompts, anything other than NULL then the string supplied will be used for the location; default NULL |
seq_min |
holds the minimum length value to not flag the sequence; default 100 |
seq_max |
holds the maximum length value to not flag the sequence; default 2500 |
Details
User Input: A list of genera in a text file in a single column with a new line at the end of the list.
Value
Outputs: One main folder containing three other folders. Main folder - Seq_auto_dl_TTTTTT_MMM_DD Three subfolders: 1. BOLD - Contains a file for every genus downloaded with the raw data from the BOLD system. 2. NCBI - Contains a file for every genus downloaded with the raw data from GenBank. 3. Total_tables - Contains files for the running of the function which include... A_Summary.txt - This file contains information about the downloads. A_Total_Table.tsv - A file with a single table containing the accumulated data for all genera searched.
Note
When using a custom search string for NCBI only a single genus at a time can be used.
Author(s)
Robert G. Young
References
<https://github.com/rgyoung6/MACER> Young, R. G., Gill, R., Gillis, D., Hanner, R. H. (Submitted June 2021). Molecular Acquisition, Cleaning, and Evaluation in R (MACER) - A tool to assemble molecular marker datasets from BOLD and GenBank. Biodiversity Data Journal.
See Also
create_fastas() align_to_ref() barcode_clean()
Examples
## Not run:
auto_seq_download()
auto_seq_download(BOLD_database = TRUE, NCBI_database = FALSE)
auto_seq_download(BOLD_database = FALSE, NCBI_database = TRUE)
## End(Not run)