panMatrix {micropan}R Documentation

Computing the pan-matrix for a set of gene clusters

Description

A pan-matrix has one row for each genome and one column for each gene cluster, and cell ‘⁠[i,j]⁠’ indicates how many members genome ‘⁠i⁠’ has in gene family ‘⁠j⁠’.

Usage

panMatrix(clustering)

Arguments

clustering

A named vector of integers.

Details

The pan-matrix is a central data structure for pan-genomic analysis. It is a matrix with one row for each genome in the study, and one column for each gene cluster. Cell ‘⁠[i,j]⁠’ contains an integer indicating how many members genome ‘⁠i⁠’ has in cluster ‘⁠j⁠’.

The input clustering must be a named integer vector with one element for each sequence in the study, typically produced by either bClust or dClust. The name of each element is a text identifying every sequence. The value of each element indicates the cluster, i.e. those sequences with identical values are in the same cluster. IMPORTANT: The name of each sequence must contain the ‘⁠genome_id⁠’ for each genome, i.e. they must of the form ‘⁠GID111_seq1⁠’, ‘⁠GID111_seq2⁠’,... where the ‘⁠GIDxxx⁠’ part indicates which genome the sequence belongs to. See panPrep for details.

The rows of the pan-matrix is named by the ‘⁠genome_id⁠’ for every genome. The columns are just named ‘⁠Cluster_x⁠’ where ‘⁠x⁠’ is an integer copied from ‘⁠clustering⁠’.

Value

An integer matrix with a row for each genome and a column for each sequence cluster. The input vector ‘⁠clustering⁠’ is attached as the attribute ‘⁠clustering⁠’.

Author(s)

Lars Snipen and Kristian Hovde Liland.

See Also

bClust, dClust, distManhattan, distJaccard, fluidity, chao, binomixEstimate, heaps, rarefaction.

Examples

# Loading clustering data in this package
data(xmpl.bclst)

# Pan-matrix based on the clustering
panmat <- panMatrix(xmpl.bclst)

## Not run: 
# Plotting cluster distribution
library(ggplot2)
tibble(Clusters = as.integer(table(factor(colSums(panmat > 0), levels = 1:nrow(panmat)))),
       Genomes = 1:nrow(panmat)) %>% 
ggplot(aes(x = Genomes, y = Clusters)) +
geom_col()

## End(Not run)


[Package micropan version 2.1 Index]