ngraMatrix {EnvNJ} | R Documentation |
Compute n-Gram Frequencies Dataframe
Description
Computes the n-gram frequencies dataframe for the protein and species provides.
Usage
ngraMatrix(data, k = 4, silent = FALSE)
Arguments
data |
a dataframe with as many columns as species and one row per orthologous protein. The rows and columns must be named accordingly. |
k |
a positive integer, between 1 and 5, indicating the k-mer of the words to be counted. |
silent |
logical, set to FALSE to avoid loneliness. |
Details
The argument prot can be obtained using orth() and orth.seq().
Value
A list with two dataframes. The first one with nsp * npr columns (nsp: number of species, npr: number of proteins per species) and npe rows (npe: number of peptides, 20 for n = 1, 400 for n = 2, 8000 for n = 3 and 160000 for n = 4). The entries of the dataframe are the number of times that the indicated peptide has been counted in the given protein. Orthologous proteins are in consecutive columns, thus the first nsp columns are the orthologous of protein 1 and so on. The second dataframe contains the Species Vector Sums (each vector describes one species).
References
Stuart et al. Bioinformatics 2002; 18:100-108.
See Also
ngram(), svdgram()
Examples
ngraMatrix(bovids[,1:3], k = 2)