R: Get nucleotide sequences from NCBI

get_seq_for_DB {disprose}

R Documentation

Get nucleotide sequences from NCBI

Description

Retrieves nucleotide sequences from NCBI for given identification numbers.

Usage

get_seq_for_DB(
  ids,
  db,
  check.result = FALSE,
  return = "data.frame",
  fasta.file = NULL,
  exclude.from.download = FALSE,
  exclude.var,
  exclude.pattern,
  exclude.fixed = TRUE,
  verbose = TRUE
)

get_seq_for_DB_fix(res.data, db, verbose = TRUE)

Arguments

`ids`	vector of NCBI sequences' identification numbers: GenBank accession numbers, GenInfo identifiers (GI) or Entrez unique identifiers (UID)
`db`	character; NCBI database for search. See entrez_dbs() for possible values
`check.result`	logical; check if download was done correctly
`return`	character; sequence returned object; possible values are "vector", "data.frame" and "fasta"
`fasta.file`	character; FASTA file name and path, only used if `return = "fasta"`
`exclude.from.download`	logical; ignore some sequences while downloading
`exclude.var`	vector that is used to define which sequences should be ignored, only used if `exclude.from.download = TRUE`.
`exclude.pattern`	value that matches to `exclude.var` and marks unwanted sequences, only used if `exclude.from.download = TRUE`
`exclude.fixed`	logical; match `exclude.pattern` as is, only used if `exclude.from.download = TRUE`.
`verbose`	logical; show messages
`res.data`	data.frame; data frame of nucleotide ids and previously downloaded sequences

Details

Master records (for example, in WGS-project) do not contain any nucleotide. They might be excluded from download with exclude.from.download parameters. However this has no affect and such ids do not have to be excluded when loading.

If writing FASTA to existing FASTA file, sequences are appended.

Value

If return = "vector" function returns vector of nucleotide sequences, return = "data.frame" - data frame with nucleotide ids and nucleotide sequences, return = "fasta" - writes FASTA file, no data returned.

Functions

get_seq_for_DB: Retrieves NCBI nucleotide sequences for given identification numbers.
get_seq_for_DB_fix: Checks the downloads and tries to retrieve the compromised data.

Author(s)

Elena N. Filatova

Examples

ids<-c(2134240466, 2134240465, 2134240464)
fasta.file<-tempfile()
get_seq_for_DB (ids = ids, db = "nucleotide", check.result = TRUE,
                return = "fasta", fasta.file = fasta.file, exclude.from.download=FALSE)
file.remove(fasta.file)

[Package disprose version 0.1.6 Index]