get_GA_files {disprose} | R Documentation |
Read GISAID sequence file
Description
Get metadata and nucleotide sequence from GISAID files
Usage
get_GA_files(
dir.path,
return = "both",
seq.return = "data.frame",
fasta.file = NULL,
verbose = TRUE
)
Arguments
dir.path |
character; directory name and path |
return |
character; type of returned object; possible values are:
|
seq.return |
character; sequence returned object; possible values are "vector", "data.frame" and "fasta" |
fasta.file |
character; FASTA file name and path, only used if |
verbose |
logical; show messages |
Details
This function works with downloaded from GISAID "Input for the Augur pipeline" archives (with "metadata.tsv" and "sequences.fasta" files). Archives must be unzipped before usage. All extracted from GISAID archive files must be in one directory.
If return = "seq"
, serial numbers are used as sequence identification numbers.
Metadata is transformed into data frame of the same format as get_seq_info function does. Sequences are transformed into data type of the same format as get_seq_for_DB function does.
Value
List of length two, where first is metadata and second is nucleotide sequence.
If return = "info"
or return = "seq"
only first or second element is returned.
Author(s)
Elena N. Filatova
Examples
## Not run:
# First download some sequences' archives from GISAID (https://www.gisaid.org/)
# unzip them and put into "gisaidfiles" directory
res <- get_GA_files (dir.path = "gisaidfiles", return = "info")
res <- get_GA_files (dir.path = "gisaidfiles", return = "seq", seq.return = "data.frame")
res <- get_GA_files (dir.path = "gisaidfiles", return ="both", seq.return = "fasta")
## End(Not run)