R: Classifying with the RDP classifier

rdpClassify {microclass}

R Documentation

Classifying with the RDP classifier

Description

Classifying sequences by a trained presence/absence K-mer model.

Usage

rdpClassify(sequence, trained.model, post.prob = FALSE, prior = FALSE)

Arguments

`sequence`	Character vector of sequences to classify.
`trained.model`	A list with a trained model, see `rdpTrain`.
`post.prob`	Logical indicating if posterior log-probabilities should be returned.
`prior`	Logical indicating if classification should be done by flat priors (default) or with empirical priors (prior=TRUE).

Details

The classification step of the presence/absence method known as the RDP classifier (Wang et al 2007) means looking for K-mers on all sequences, and computing the posterior probabilities for each taxon using a trained model and a naive Bayes assumption. The predicted taxon is the one producing the maximum posterior probability, for each sequence.

The classification is parallelized through RcppParallel employing Intel TBB and TinyThread. By default all available processing cores are used. This can be changed using the function setParallel.

Value

A character vector with the predicted taxa, one for each sequence.

Author(s)

Kristian Hovde Liland and Lars Snipen.

References

Wang, Q, Garrity, GM, Tiedje, JM, Cole, JR (2007). Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Applied and Enviromental Microbiology, 73: 5261-5267.

Examples

data("small.16S")
seq <- small.16S$Sequence
tax <- sapply(strsplit(small.16S$Header,split=" "),function(x){x[2]})
## Not run: 
trn <- rdpTrain(seq,tax)
primer.515f <- "GTGYCAGCMGCCGCGGTAA"
primer.806rB <- "GGACTACNVGGGTWTCTAAT"
reads <- amplicon(seq, primer.515f, primer.806rB)
predicted <- rdpClassify(unlist(reads[nchar(reads)>0]),trn)
print(predicted)

## End(Not run)

[Package microclass version 1.2 Index]