cut_probes {disprose} | R Documentation |
Cut probes
Description
Generate probes from nucleotide reference sequences
Usage
cut_probes(
ref.seq.from.file = FALSE,
ref.seq.id,
ref.seq.db,
fasta.file = NULL,
delete.fasta = FALSE,
start = 1,
stop = NULL,
start.correction = FALSE,
size = 24:32,
delete.incomplete = FALSE,
delete.identical = FALSE,
give.probes.id = FALSE,
mc.cores = 1,
verbose = TRUE
)
Arguments
ref.seq.from.file |
logical; read reference sequences from file ( |
ref.seq.id |
identification number of reference nucleotide sequences. Only used when |
ref.seq.db |
character; NCBI database for search. See entrez_dbs for possible values.
Only used when |
fasta.file |
character; FASTA file name and path, only used when |
delete.fasta |
logical; delete FASTA file. |
start , stop |
integer; number of first and last nucleotide of the reference sequence's segment that should be cut into probes. All sequence is used by default. |
start.correction |
logical; count probes' start and stop nucleotides relatively to the specified segment ( |
size |
integer; vector of probe size |
delete.incomplete |
logical; remove probes that contain undeciphered nucleotides |
delete.identical |
logical; remove identical (duplicated) probes |
give.probes.id |
logical; add probes' identification numbers |
mc.cores |
integer; number of processors for parallel computation (not supported on Windows) |
verbose |
logical; show messages |
Details
This function takes nucleotide sequences and cut them on segments (probes) of given size. Sequences might be downloaded from given FASTA file or from NCBI data bases. In the latter case, FASTA file is created. If desired, FASTA file can be deleted after.
Not all sequence must be cut on probes, you may define needed segment by start
and stop
parameters.
Note that in this case probes' start and stop nucleotides would be counted relatively to the specified segment (start.correction = FALSE
)
or to the whole sequence (start.correction = TRUE
).
Undeciphered nucleotides are the one that are indicated by "rywsmkhbvdn" symbols.
Probes' identification numbers are created by adding numeric indexes to reference sequence's identification number.
See cut_string, delete_duplicates_DF and make_ids for details.
Value
Data frame with probe id (optionally), sequence id, probe size, start and stop nucleotide, sequence.
Author(s)
Elena N. Filatova
Examples
path <- tempdir()
dir.create (path)
# download and save as FASTA "Chlamydia pneumoniae B21 contig00001,
# whole genome shotgun sequence" (GI = 737435910)
if (!requireNamespace("rentrez", quietly = TRUE)) {
stop("Package \"rentrez\" needed for this function to work. Please install it.", call. = FALSE)}
reference.string <- rentrez::entrez_fetch(db = "nucleotide", id = 737435910,
rettype="fasta")
write( x= reference.string, file = paste0 (path, "/fasta"))
probes <- cut_probes (ref.seq.from.file = TRUE, fasta.file = paste0(path, "/fasta"),
delete.fasta = TRUE, start = 1000, stop = 1500,
start.correction = FALSE, size = c(400, 500),
delete.incomplete = FALSE,
delete.identical = FALSE, give.probes.id = TRUE, mc.cores = 1)
unlink (path, recursive = TRUE)