bibit3 {BiBitR}R Documentation

The BiBit Algorithm with Noise Allowance guided by Provided Patterns.

Description

Same function as bibit2 but only aims to discover biclusters containing the (sub) pattern of provided patterns or their combinations.

Usage

bibit3(matrix = NULL, minr = 1, minc = 2, noise = 0,
  pattern_matrix = NULL, subpattern = TRUE, pattern_combinations = FALSE,
  arff_row_col = NULL, extend_columns = "none", extend_mincol = 1,
  extend_limitcol = 1, extend_noise = noise, extend_contained = FALSE)

Arguments

matrix

The binary input matrix.

minr

The minimum number of rows of the Biclusters. (Note that in contrast to bibit and bibit2, this can be be set to 1 since we are looking for additional rows to the provided pattern.)

minc

The minimum number of columns of the Biclusters.

noise

Noise parameter which determines the amount of zero's allowed in the bicluster (i.e. in the extra added rows to the starting row pair).

  • noise=0: No noise allowed. This gives the same result as using the bibit function. (default)

  • 0<noise<1: The noise parameter will be a noise percentage. The number of allowed 0's in a (extra) row in the bicluster will depend on the column size of the bicluster. More specifically zeros_allowed = ceiling(noise * columnsize). For example for noise=0.10 and a bicluster column size of 5, the number of allowed 0's would be 1.

  • noise>=1: The noise parameter will be the number of allowed 0's in a (extra) row in the bicluster independent from the column size of the bicluster. In this noise option, the noise parameter should be an integer.

pattern_matrix

Matrix (Number of Patterns x Number of Data Columns) containing the patterns of interest.

subpattern

Boolean value if sub patterns are of interest as well (default=TRUE).

pattern_combinations

Boolean value if the pairwise combinations of patterns (the intersecting 1's) should also used as starting points (default=FALSE).

arff_row_col

Same argument as in bibit and bibit2. However you can only provide 1 pattern by using this option. For bibit3 to work, the pattern has to be added 2 times on top of the matrix (= identical first 2 rows).

extend_columns

Column Extension Parameter
Can be one of the following: "none", "naive", "recursive" which will apply either a naive or recursive column extension procedure. (See Details Section for more information.)
Based on the extension, additional biclusters will be created in the Biclust object which can be seen in the column and row names of the RowxNumber and NumberxCol slots ("_Ext" suffix).
The info slot will also contain some additional information. Inside this slot, BC.Extended contains info on which original biclusters were extended, how many columns were added, and in how many extra extended biclusters this resulted.

Warning: Using a percentage-based extend_noise (or noise by default) in combination with the recursive procedure will result in a large amount of biclusters and increase the computation time a lot. Depending on the data when using recursive in combination with a noise percentage, it is advised to keep it reasonable small (e.g. 10%). Another remedy is to sufficiently increase the extend_limitcol either as a percentage or integer to limit the candidates of columns.

extend_mincol

Column Extension Parameter
A minimum number of columns that a bicluster should be able to be extended with before saving the result. (Default=1)

extend_limitcol

Column Extension Parameter
The number (extend_limitcol>=1) or percentage (0<extend_limitcol<1) of 1's that a column (subsetted on the BC rows) should at least contain for it to be a candidate to be added to the bicluster as an extension. (Default=1) (Increase this parameter if the recursive extension takes too long. Limiting the pool of candidates will decrease computation time, but restrict the results more.)

extend_noise

Column Extension Parameter
The maximum allowed noise (in each row) when extending the columns of the bicluster. Can take the same as the noise parameter. By default this is the same value as noise.

extend_contained

Column Extension Parameter
Logical value if extended results should be checked if they contain each other (and deleted if this is the case). Default = FALSE. This can be a lengthy procedure for a large amount of biclusters (>1000).

Details

The goal of the bibit3 function is to provide one or multiple patterns in order to only find those biclusters exhibiting those patterns. Multiple patterns can be given in matrix format, pattern_matrix, and their pairwise combinations can automatically be added to this matrix by setting pattern_combinations=TRUE. All discovered biclusters are still subject to the provided noise level.

Three types of Biclusters can be discovered:

Full Pattern:

Bicluster which overlaps completely (within allowed noise levels) with the provided pattern. The column size of this bicluster is always equal to the number of 1's in the pattern.

Sub Pattern:

Biclusters which overlap with a part of the provided pattern within allowed noise levels. Will only be given if subpattern=TRUE (default). Setting this option to FALSE decreases computation time.

Extended:

Using the resulting biclusters from the full and sub patterns, other columns will be attempted to be added to the biclusters while keeping the noise as low as possible (the number of rows in the BC stays constant). This can be done either with extend_columns equal to "naive" or "recursive". More info on the difference can be found in the Details Section of bibit2.
Naturally the articially added pattern rows will not be taken into account with the noise levels as they are 0 in each other column.
The question which is attempted to be answered here is 'Do the rows, which overlap partly or fully with the given pattern, have other similarities outside the given pattern?'

How?
The BiBit algorithm is applied to a data matrix that contains 2 identical artificial rows at the top which contain the given pattern. The default algorithm is then slightly altered to only start from this articial row pair (=Full Pattern) or from 1 artificial row and 1 other row (=Sub Pattern).

Note 1 - Large Data:
The arff_row_col can still be provided in case of large data matrices, but the .arff file should already contain the pattern of interest in the first two rows. Consequently not more than 1 pattern at a time can be investigated with a single call of bibit3.

Note 2 - Viewing Results:
A print and summary method has been implemented for the output object of bibit3. It gives an overview of the amount of discovered biclusters and their dimensions
Additionally, the bibit3_patternBC function can extract a Bicluster and add the artificial pattern rows to investigate the results.

Value

A S3 list object, "bibit3" in which each element (apart from the last one) corresponds with a provided pattern or combination thereof.
Each element is a list containing:

Number:

Number of Initially found BC's by applying BiBit with the provided pattern.

Number_Extended:

Number of additional discovered BC's by extending the columns.

FullPattern:

Biclust S4 Class Object containing the Bicluster with the Full Pattern.

SubPattern:

Biclust S4 Class Object containing the Biclusters showing parts of the pattern.

Extended:

Biclust S4 Class Object containing the additional Biclusters after extending the biclusters (column wise) of the full and sub patterns

info:

Contains Time_Min element which includes the elapsed time of parts and the full analysis.

The last element in the list is a matrix containing all the investigated patterns.

Author(s)

Ewoud De Troyer

References

Domingo S. Rodriguez-Baena, Antonia J. Perez-Pulido and Jesus S. Aguilar-Ruiz (2011), "A biclustering algorithm for extracting bit-patterns from binary datasets", Bioinformatics

Examples

## Not run:  
set.seed(1)
data <- matrix(sample(c(0,1),100*100,replace=TRUE,prob=c(0.9,0.1)),nrow=100,ncol=100)
data[1:10,1:10] <- 1 # BC1
data[11:20,11:20] <- 1 # BC2
data[21:30,21:30] <- 1 # BC3
colsel <- sample(1:ncol(data),ncol(data))
data <- data[sample(1:nrow(data),nrow(data)),colsel]

pattern_matrix <- matrix(0,nrow=3,ncol=100)
pattern_matrix[1,1:7] <- 1
pattern_matrix[2,11:15] <- 1
pattern_matrix[3,13:20] <- 1

pattern_matrix <- pattern_matrix[,colsel]


out <- bibit3(matrix=data,minr=2,minc=2,noise=0.1,pattern_matrix=pattern_matrix,
              subpattern=TRUE,extend_columns=TRUE,pattern_combinations=TRUE)
out  # OR print(out) OR summary(out)


bibit3_patternBC(result=out,matrix=data,pattern=c(1),type=c("full","sub","ext"),BC=c(1,2))

## End(Not run)

[Package BiBitR version 0.3.1 Index]