get_GA_files {disprose}R Documentation

Read GISAID sequence file

Description

Get metadata and nucleotide sequence from GISAID files

Usage

get_GA_files(
  dir.path,
  return = "both",
  seq.return = "data.frame",
  fasta.file = NULL,
  verbose = TRUE
)

Arguments

dir.path

character; directory name and path

return

character; type of returned object; possible values are: "info" (sequence metadata), "seq" (nucleotide sequences), "both" (both of them).

seq.return

character; sequence returned object; possible values are "vector", "data.frame" and "fasta"

fasta.file

character; FASTA file name and path, only used if return = "fasta"

verbose

logical; show messages

Details

This function works with downloaded from GISAID "Input for the Augur pipeline" archives (with "metadata.tsv" and "sequences.fasta" files). Archives must be unzipped before usage. All extracted from GISAID archive files must be in one directory.

If return = "seq", serial numbers are used as sequence identification numbers.

Metadata is transformed into data frame of the same format as get_seq_info function does. Sequences are transformed into data type of the same format as get_seq_for_DB function does.

Value

List of length two, where first is metadata and second is nucleotide sequence. If return = "info" or return = "seq" only first or second element is returned.

Author(s)

Elena N. Filatova

Examples

## Not run: 
# First download some sequences' archives from GISAID (https://www.gisaid.org/)
# unzip them and put into "gisaidfiles" directory

res <- get_GA_files (dir.path = "gisaidfiles", return = "info")
res <- get_GA_files (dir.path = "gisaidfiles", return = "seq", seq.return = "data.frame")
res <- get_GA_files (dir.path = "gisaidfiles", return ="both", seq.return = "fasta")

## End(Not run)


[Package disprose version 0.1.6 Index]