R: Cut probes

cut_probes {disprose}

R Documentation

Cut probes

Description

Generate probes from nucleotide reference sequences

Usage

cut_probes(
  ref.seq.from.file = FALSE,
  ref.seq.id,
  ref.seq.db,
  fasta.file = NULL,
  delete.fasta = FALSE,
  start = 1,
  stop = NULL,
  start.correction = FALSE,
  size = 24:32,
  delete.incomplete = FALSE,
  delete.identical = FALSE,
  give.probes.id = FALSE,
  mc.cores = 1,
  verbose = TRUE
)

Arguments

`ref.seq.from.file`	logical; read reference sequences from file (`TRUE`) or download them from NCBI data base (`FALSE`).
`ref.seq.id`	identification number of reference nucleotide sequences. Only used when `ref.seq.from.file = FALSE`. GenBank accession numbers, GenInfo identifiers (GI) or Entrez unique identifiers (UID) may be used.
`ref.seq.db`	character; NCBI database for search. See entrez_dbs for possible values. Only used when `ref.seq.from.file = FALSE`.
`fasta.file`	character; FASTA file name and path, only used when `ref.seq.from.file = TRUE`.
`delete.fasta`	logical; delete FASTA file.
`start`, `stop`	integer; number of first and last nucleotide of the reference sequence's segment that should be cut into probes. All sequence is used by default.
`start.correction`	logical; count probes' start and stop nucleotides relatively to the specified segment (`FALSE`) or to the whole sequence (`TRUE`). Only used if `start>1`.
`size`	integer; vector of probe size
`delete.incomplete`	logical; remove probes that contain undeciphered nucleotides
`delete.identical`	logical; remove identical (duplicated) probes
`give.probes.id`	logical; add probes' identification numbers
`mc.cores`	integer; number of processors for parallel computation (not supported on Windows)
`verbose`	logical; show messages

Details

This function takes nucleotide sequences and cut them on segments (probes) of given size. Sequences might be downloaded from given FASTA file or from NCBI data bases. In the latter case, FASTA file is created. If desired, FASTA file can be deleted after.

Not all sequence must be cut on probes, you may define needed segment by start and stop parameters. Note that in this case probes' start and stop nucleotides would be counted relatively to the specified segment (start.correction = FALSE) or to the whole sequence (start.correction = TRUE).

Undeciphered nucleotides are the one that are indicated by "rywsmkhbvdn" symbols.

Probes' identification numbers are created by adding numeric indexes to reference sequence's identification number.

See cut_string, delete_duplicates_DF and make_ids for details.

Value

Data frame with probe id (optionally), sequence id, probe size, start and stop nucleotide, sequence.

Author(s)

Elena N. Filatova

Examples

path <- tempdir()
dir.create (path)
# download and save as FASTA "Chlamydia pneumoniae B21 contig00001,
# whole genome shotgun sequence" (GI = 737435910)
if (!requireNamespace("rentrez", quietly = TRUE)) {
stop("Package \"rentrez\" needed for this function to work. Please install it.", call. = FALSE)}
reference.string <- rentrez::entrez_fetch(db = "nucleotide", id = 737435910,
                                         rettype="fasta")
write( x= reference.string, file = paste0 (path, "/fasta"))
probes <- cut_probes (ref.seq.from.file = TRUE, fasta.file = paste0(path, "/fasta"),
                     delete.fasta = TRUE, start = 1000, stop = 1500,
                     start.correction = FALSE, size = c(400, 500),
                     delete.incomplete = FALSE,
                     delete.identical = FALSE, give.probes.id = TRUE, mc.cores = 1)
unlink (path, recursive = TRUE)

[Package disprose version 0.1.6 Index]