findGenes {microseq} | R Documentation |
Finding coding genes
Description
Finding coding genes in genomic DNA using the Prodigal software.
Usage
findGenes(
genome,
prodigal.exe = "prodigal",
faa.file = "",
ffn.file = "",
proc = "single",
trans.tab = 11,
mask.N = FALSE,
bypass.SD = FALSE
)
Arguments
genome |
A table with columns Header and Sequence, containing the genome sequence(s). |
prodigal.exe |
Command to run the external software prodigal on the system (text). |
faa.file |
If provided, prodigal will output all proteins to this fasta-file (text). |
ffn.file |
If provided, prodigal will output all DNA sequences to this fasta-file (text). |
proc |
Either |
trans.tab |
Either 11 or 4 (see below). |
mask.N |
Turn on masking of N's (logical) |
bypass.SD |
Bypass Shine-Dalgarno filter (logical) |
Details
The external software Prodigal is used to scan through a prokaryotic genome to detect the protein
coding genes. The text in prodigal.exe
must contain the exact command to invoke barrnap on the system.
In addition to the standard output from this function, FASTA files with protein and/or DNA sequences may
be produced directly by providing filenames in faa.file
and ffn.file
.
The input proc
allows you to specify if the input data should be treated as a single genome
(default) or as a metagenome. In the latter case the genome
are (un-binned) contigs.
The translation table is by default 11 (the standard code), but table 4 should be used for Mycoplasma etc.
The mask.N
will prevent genes having runs of N inside. The bypass.SD
turn off the search
for a Shine-Dalgarno motif.
Value
A GFF-table (see readGFF
for details) with one row for each detected
coding gene.
Note
The prodigal software must be installed on the system for this function to work, i.e. the command ‘system("prodigal -h")’ must be recognized as a valid command if you run it in the Console window.
Author(s)
Lars Snipen and Kristian Hovde Liland.
See Also
Examples
## Not run:
# This example requires the external prodigal software
# Using a genome file in this package.
genome.file <- file.path(path.package("microseq"),"extdata","small.fna")
# Searching for coding sequences, this is Mycoplasma (trans.tab = 4)
genome <- readFasta(genome.file)
gff.tbl <- findGenes(genome, trans.tab = 4)
# Retrieving the sequences
cds.tbl <- gff2fasta(gff.tbl, genome)
# You may use the pipe operator
library(ggplot2)
readFasta(genome.file) %>%
findGenes(trans.tab = 4) %>%
filter(Score >= 50) %>%
ggplot() +
geom_histogram(aes(x = Score), bins = 25)
## End(Not run)