ngraMatrix {EnvNJ}R Documentation

Compute n-Gram Frequencies Dataframe

Description

Computes the n-gram frequencies dataframe for the protein and species provides.

Usage

ngraMatrix(data, k = 4, silent = FALSE)

Arguments

data

a dataframe with as many columns as species and one row per orthologous protein. The rows and columns must be named accordingly.

k

a positive integer, between 1 and 5, indicating the k-mer of the words to be counted.

silent

logical, set to FALSE to avoid loneliness.

Details

The argument prot can be obtained using orth() and orth.seq().

Value

A list with two dataframes. The first one with nsp * npr columns (nsp: number of species, npr: number of proteins per species) and npe rows (npe: number of peptides, 20 for n = 1, 400 for n = 2, 8000 for n = 3 and 160000 for n = 4). The entries of the dataframe are the number of times that the indicated peptide has been counted in the given protein. Orthologous proteins are in consecutive columns, thus the first nsp columns are the orthologous of protein 1 and so on. The second dataframe contains the Species Vector Sums (each vector describes one species).

References

Stuart et al. Bioinformatics 2002; 18:100-108.

See Also

ngram(), svdgram()

Examples

ngraMatrix(bovids[,1:3], k = 2)

[Package EnvNJ version 0.1.3 Index]