multinomClassify {microclass} | R Documentation |
Classifying with a Multinomial model
Description
Classifying sequences by a trained Multinomial model.
Usage
multinomClassify(sequence, trained.model, post.prob = FALSE, prior = FALSE)
Arguments
sequence |
Character vector of 16S sequences to classify. |
trained.model |
A list with a trained model, see |
post.prob |
Logical indicating if posterior log-probabilities should be returned. |
prior |
Logical indicating if classification should be done by flat priors (default) or with empirical priors (prior=TRUE). |
Details
The classification step of the Multinomial method (Vinje et al, 2015) means counting K-mers on all sequences, and computing the posterior probabilities for each taxon in the trained model. The predicted taxon for each input sequence is the one with the maximum posterior probability for that sequence.
By setting post.prob=TRUE
you will get the log-probability of the best and second best taxon
for each sequence. This can be used for evaluating the certainty in the classifications,
see taxMachine
.
The classification is parallelized through RcppParallel
employing Intel TBB and TinyThread. By default all available
processing cores are used. This can be changed using the
function setParallel
.
Value
If post.prob=FALSE
a character vector of predicted taxa is returned.
If post.prob=TRUE
a data.frame
with three columns is returned. Taxon
is the vector of predicted taxa, one for each sequence in sequence
. The
Post.prob.1 and Post.prob.2 are vectors with the maximum and second largest posterior
log-probabilities for each sequence.
Author(s)
Kristian Hovde Liland and Lars Snipen.
References
Vinje, H, Liland, KH, Almøy, T, Snipen, L. (2015). Comparing K-mer based methods for improved classification of 16S sequences. BMC Bioinformatics, 16:205.
See Also
Examples
data("small.16S")
seq <- small.16S$Sequence
tax <- sapply(strsplit(small.16S$Header,split=" "),function(x){x[2]})
## Not run:
trn <- multinomTrain(seq,tax)
primer.515f <- "GTGYCAGCMGCCGCGGTAA"
primer.806rB <- "GGACTACNVGGGTWTCTAAT"
reads <- amplicon(seq, primer.515f, primer.806rB)
predicted <- multinomClassify(unlist(reads[nchar(reads)>0]),trn)
print(predicted)
## End(Not run)