msc.matrix {rKOMICS}R Documentation

Build cluster matrix

Description

The msc.matrix function reads the output of clustering analyses (UC file) for specified minimum percent identity (MPI) values and organizes the data into a matrix format. This matrix represents the presence or absence of Minicircle Sequence Classes (MSCs) in each sample. The resulting matrix simplifies downstream analyses and visualizations by eliminating the need for manual data manipulation and reformatting.

Usage

msc.matrix(files, samples, groups)

Arguments

files

a character vector containing the names of the UC files generated by the VSEARCH tool. Each file represents the output of clustering analysis for a specific minimum percent identity (MPI), such as all.minicircles.circ.id70.uc, all.minicircles.circ.id80.uc, and so on. Please ensure that your file names end with 'idxx.uc' for this function to work properly.

samples

a character vector containing the sample names.

groups

a vector of the same length as the samples, specifying the groups (e.g., species) to which the samples belong.

Value

a a list that contains one cluster matrix per percent identity. Each matrix represents the presence or absence of MSCs in each sample. In the cluster matrix, a value of 0 indicates that the MSC is not present in the sample, while a value higher than 0 indicates that the MSC is found at least once in the sample.

Examples

data(exData)

### run function

matrices <- msc.matrix(files = system.file("extdata", exData$ucs, package="rKOMICS"), 
                      samples = exData$samples, 
                      groups = exData$species)


### or: 
data(matrices)

### show matrix with id 95%
matrices[["id95"]]
rowSums(matrices[["id95"]]) # --> frequency of MSC across all samples
colSums(matrices[["id95"]]) # --> number of MSC per sample


[Package rKOMICS version 1.3 Index]