R: Training multinomial model

multinomTrain {microclass}

R Documentation

Training multinomial model

Description

Training the multinomial K-mer method on sequence data.

Usage

multinomTrain(sequence, taxon, K = 8, col.names = FALSE, n.pseudo = 100)

Arguments

`sequence`	Character vector of 16S sequences.
`taxon`	Character vector of taxon labels for each sequence.
`K`	Word length (integer).
`col.names`	Logical indicating if column names should be added to the trained model matrix.
`n.pseudo`	Number of pseudo-counts to use (positive numerics, need not be integer). Special case -1 will only return word counts, not log-probabilities.

Details

The training step of the multinomial method (Vinje et al, 2015) means counting K-mers on all sequences and compute the multinomial probabilities for each K-mer for each unique taxon. n.pseudo pseudo-counts are added, divided equally over all K-mers, before probabilities are estimated. The optimal choice of n.pseudo will depend on K and the training data set. The default value n.pseudo=100 has proven good for K=8 and the contax.trim data set (see the microcontax R-package).

Adding the actual K-mers as column names (col.names=TRUE) will slow down the computations.

The relative taxon sizes are also computed, and may be used as an empirical prior in the classification step (see "prior" below).

Value

A list with two elements. The first element is Method, which is the text "multinom" in this case. The second element is Fitted, which is a matrix of probabilities with one row for each unique taxon and one column for each possible word of lengthK. The sum of each row is 1.0. No probabilities are 0 if n.pseudo>0.0.

The matrix Fitted has an attribute attr("prior",), that contains the relative taxon sizes.

Author(s)

Kristian Hovde Liland and Lars Snipen.

References

Vinje, H, Liland, KH, Almøy, T, Snipen, L. (2015). Comparing K-mer based methods for improved classification of 16S sequences. BMC Bioinformatics, 16:205.

Examples

# See examples for multinomClassify

[Package microclass version 1.2 Index]