blastClassify16S {microclass} | R Documentation |
Classifying using BLAST
Description
A 16S based classification based on BLAST.
Usage
blastClassify16S(sequence, bdb)
Arguments
sequence |
Character vector of 16S sequences to classify. |
bdb |
Name of BLAST data base, see |
Details
A vector of 16S sequences (DNA) are classified by first using BLAST blastn
against
a database of 16S DNA sequences, and then classify according to the nearest-neighbour principle.
The nearest neighbour of a query sequence is the hit with the largest bitscore. The blast+
software https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download
must be installed on the system. Type system("blastn -help")
in the Console window,
and a sensible Help-text should appear.
The database must contain 16S sequences where the Header starts with a token specifying the taxon. More specifically, the tokens must look like:
<taxon>_1
<taxon>_2
...etc
where <taxon> is some proper taxon name. Use blastDbase16S
to make such databases.
The identity of each alignment is also computed. This should be close to 1.0 for a classification to be trusted. Identity values below 0.95 could indicate uncertain classifications, but this will vary between taxa.
Value
A data.frame
with two columns: Taxon is the predicted taxon for each sequence
and Identity is the corresponding identity-value. If no BLAST hit is seen, the sequence is
"unclassified"
.
Author(s)
Lars Snipen.
See Also
Examples
data("small.16S")
## Not run:
dbase <- blastDbase16S("test", small.16S$Sequence, word(small.16S$Header, 2, 2))
reads <- str_sub(small.16S$Sequence, 100, 550)
blastClassify16S(reads, dbase) %>%
bind_cols(small.16S) -> tbl
## End(Not run)