panMatrix {micropan} | R Documentation |
Computing the pan-matrix for a set of gene clusters
Description
A pan-matrix has one row for each genome and one column for each gene cluster, and cell ‘[i,j]’ indicates how many members genome ‘i’ has in gene family ‘j’.
Usage
panMatrix(clustering)
Arguments
clustering |
A named vector of integers. |
Details
The pan-matrix is a central data structure for pan-genomic analysis. It is a matrix with one row for each genome in the study, and one column for each gene cluster. Cell ‘[i,j]’ contains an integer indicating how many members genome ‘i’ has in cluster ‘j’.
The input clustering
must be a named integer vector with one element for each sequence in the study,
typically produced by either bClust
or dClust
. The name of each element
is a text identifying every sequence. The value of each element indicates the cluster, i.e. those
sequences with identical values are in the same cluster. IMPORTANT: The name of each sequence must
contain the ‘genome_id’ for each genome, i.e. they must of the form ‘GID111_seq1’, ‘GID111_seq2’,...
where the ‘GIDxxx’ part indicates which genome the sequence belongs to. See panPrep
for details.
The rows of the pan-matrix is named by the ‘genome_id’ for every genome. The columns are just named ‘Cluster_x’ where ‘x’ is an integer copied from ‘clustering’.
Value
An integer matrix with a row for each genome and a column for each sequence cluster. The input vector ‘clustering’ is attached as the attribute ‘clustering’.
Author(s)
Lars Snipen and Kristian Hovde Liland.
See Also
bClust
, dClust
, distManhattan
,
distJaccard
, fluidity
, chao
,
binomixEstimate
, heaps
, rarefaction
.
Examples
# Loading clustering data in this package
data(xmpl.bclst)
# Pan-matrix based on the clustering
panmat <- panMatrix(xmpl.bclst)
## Not run:
# Plotting cluster distribution
library(ggplot2)
tibble(Clusters = as.integer(table(factor(colSums(panmat > 0), levels = 1:nrow(panmat)))),
Genomes = 1:nrow(panmat)) %>%
ggplot(aes(x = Genomes, y = Clusters)) +
geom_col()
## End(Not run)