ClusterRowCoverage {BiBitR}R Documentation

Row Coverage Plots

Description

Plotting function to be used with the BiBitWorkflow output. It plots the number of clusters (of the hierarchical tree) versus the number/percentage of row coverage and number of final biclusters (see Details for more information).

Usage

ClusterRowCoverage(result, matrix, maxCluster = 20, noise = 0.1,
  noise_select = 0, plots = c(1:3), verbose = TRUE,
  plot.type = "device", filename = "RowCoverage")

Arguments

result

A BiBitWorkflow Object.

matrix

Accompanying binary data matrix which was used to obtain result.

maxCluster

Maximum number of clusters to cut the tree at (default=20).

noise

The allowed noise level when growing the rows on the merged patterns after cutting the tree. (default=0.1, namely allow 10% noise.)

  • noise=0: No noise allowed.

  • 0<noise<1: The noise parameter will be a noise percentage. The number of allowed 0's in a row in the bicluster will depend on the column size of the bicluster. More specifically zeros_allowed = ceiling(noise * columnsize). For example for noise=0.10 and a bicluster column size of 5, the number of allowed 0's would be 1.

  • noise>=1: The noise parameter will be the number of allowed 0's in a row in the bicluster independent from the column size of the bicluster. In this noise option, the noise parameter should be an integer.

noise_select

Should the allowed noise level be automatically selected for each pattern? (Using ad hoc method to find the elbow/kink in the Noise Scree plots)

  • noise_select=0: Do NOT automatically select the noise levels. Use the the noise level given in the noise parameter (default)

  • noise_select=1: Using the Noise Scree plot (with 'Added Rows' on the y-axis), find the noise level where the current number of added rows at this noise level is larger than the mean of 'added rows' at the lower noise levels. After locating this noise level, lower the noise level by 1. This is your automatically selected elbow/kink and therefore your noise level.

  • noise_select=2: Applies the same steps as for noise_select=1, but instead of decreasing the noise level by only 1, keep decreasing the noise level until the number of added rows isn't decreasing anymore either.

plots

Vector for which plots to draw:

  1. Number of Clusters versus Row Coverage Percentage

  2. Number of Clusters versus Number of Row Coverage

  3. Number of Clusters versus Final Number of Biclusters

verbose

Logical value if the progress bar of merging/growing the biclusters should be shown. (default=TRUE)

plot.type

Output Type

  • "device": All plots are outputted to new R graphics devices (default).

  • "file": All plots are saved in external files. Plots are joint together in a single .pdf file.

  • "other": All plots are outputted to the current graphics device, but will overwrite each other. Use this if you want to include one or more plots in a sweave/knitr file or if you want to export a single plot by your own chosen format.

filename

Base filename (with/without directory) for the plots if plot.type="file" (default="RowCoverage").

Details

The graph of number of chosen tree clusters versus the final row coverage can help you to make a decision on how many clusters to choose in the hierarchical tree. The more clusters you choose, the smaller (albeit more similar) the patterns are and the more rows will fit your patterns (i.e. more row coverage).

Value

A data frame containing the number of clusters and the corresponding number of row coverage, percentage of row coverage and the number of final biclusters.

Author(s)

Ewoud De Troyer

Examples

## Not run: 
## Prepare some data ##
set.seed(254)
mat <- matrix(sample(c(0,1),5000*50,replace=TRUE,prob=c(1-0.15,0.15)),
              nrow=5000,ncol=50)
mat[1:200,1:10] <- matrix(sample(c(0,1),200*10,replace=TRUE,prob=c(1-0.9,0.9)),
                          nrow=200,ncol=10)
mat[300:399,6:15] <- matrix(sample(c(0,1),100*10,replace=TRUE,prob=c(1-0.9,0.9)),
                            nrow=100,ncol=10)
mat[400:599,21:30] <- matrix(sample(c(0,1),200*10,replace=TRUE,prob=c(1-0.9,0.9)),
                             nrow=200,ncol=10)
mat[700:799,29:38] <- matrix(sample(c(0,1),100*10,replace=TRUE,prob=c(1-0.9,0.9)),
                             nrow=100,ncol=10)
mat <- mat[sample(1:5000,5000,replace=FALSE),sample(1:50,50,replace=FALSE)]

## Apply BiBitWorkflow ##
out <- BiBitWorkflow(matrix=mat,minr=50,minc=5,noise=0.2,cut_type="number",cut_pm=10)
# Make ClusterRowCoverage Plots
ClusterRowCoverage(result=out,matrix=mat,maxCluster=20,noise=0.2)

## End(Not run)

[Package BiBitR version 0.3.1 Index]