R: Classifying with a Multinomial model

multinomClassify {microclass}

R Documentation

Classifying with a Multinomial model

Description

Classifying sequences by a trained Multinomial model.

Usage

multinomClassify(sequence, trained.model, post.prob = FALSE, prior = FALSE)

Arguments

`sequence`	Character vector of 16S sequences to classify.
`trained.model`	A list with a trained model, see `multinomTrain`.
`post.prob`	Logical indicating if posterior log-probabilities should be returned.
`prior`	Logical indicating if classification should be done by flat priors (default) or with empirical priors (prior=TRUE).

Details

The classification step of the Multinomial method (Vinje et al, 2015) means counting K-mers on all sequences, and computing the posterior probabilities for each taxon in the trained model. The predicted taxon for each input sequence is the one with the maximum posterior probability for that sequence.

By setting post.prob=TRUE you will get the log-probability of the best and second best taxon for each sequence. This can be used for evaluating the certainty in the classifications, see taxMachine.

The classification is parallelized through RcppParallel employing Intel TBB and TinyThread. By default all available processing cores are used. This can be changed using the function setParallel.

Value

If post.prob=FALSE a character vector of predicted taxa is returned.

If post.prob=TRUE a data.frame with three columns is returned. Taxon is the vector of predicted taxa, one for each sequence in sequence. The Post.prob.1 and Post.prob.2 are vectors with the maximum and second largest posterior log-probabilities for each sequence.

Author(s)

Kristian Hovde Liland and Lars Snipen.

References

Vinje, H, Liland, KH, Almøy, T, Snipen, L. (2015). Comparing K-mer based methods for improved classification of 16S sequences. BMC Bioinformatics, 16:205.

Examples

data("small.16S")
seq <- small.16S$Sequence
tax <- sapply(strsplit(small.16S$Header,split=" "),function(x){x[2]})
## Not run: 
trn <- multinomTrain(seq,tax)
primer.515f <- "GTGYCAGCMGCCGCGGTAA"
primer.806rB <- "GGACTACNVGGGTWTCTAAT"
reads <- amplicon(seq, primer.515f, primer.806rB)
predicted <- multinomClassify(unlist(reads[nchar(reads)>0]),trn)
print(predicted)

## End(Not run)

[Package microclass version 1.2 Index]