rdpClassify {microclass} | R Documentation |
Classifying with the RDP classifier
Description
Classifying sequences by a trained presence/absence K-mer model.
Usage
rdpClassify(sequence, trained.model, post.prob = FALSE, prior = FALSE)
Arguments
sequence |
Character vector of sequences to classify. |
trained.model |
A list with a trained model, see |
post.prob |
Logical indicating if posterior log-probabilities should be returned. |
prior |
Logical indicating if classification should be done by flat priors (default) or with empirical priors (prior=TRUE). |
Details
The classification step of the presence/absence method known as the RDP classifier
(Wang et al 2007) means looking for K-mers on all sequences, and computing the posterior
probabilities for each taxon using a trained model and a naive Bayes assumption. The predicted
taxon is the one producing the maximum posterior probability, for each sequence
.
The classification is parallelized through RcppParallel
employing Intel TBB and TinyThread. By default all available
processing cores are used. This can be changed using the
function setParallel
.
Value
A character vector with the predicted taxa, one for each sequence
.
Author(s)
Kristian Hovde Liland and Lars Snipen.
References
Wang, Q, Garrity, GM, Tiedje, JM, Cole, JR (2007). Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Applied and Enviromental Microbiology, 73: 5261-5267.
See Also
Examples
data("small.16S")
seq <- small.16S$Sequence
tax <- sapply(strsplit(small.16S$Header,split=" "),function(x){x[2]})
## Not run:
trn <- rdpTrain(seq,tax)
primer.515f <- "GTGYCAGCMGCCGCGGTAA"
primer.806rB <- "GGACTACNVGGGTWTCTAAT"
reads <- amplicon(seq, primer.515f, primer.806rB)
predicted <- rdpClassify(unlist(reads[nchar(reads)>0]),trn)
print(predicted)
## End(Not run)