bibit2 {BiBitR}R Documentation

The BiBit Algorithm with Noise Allowance

Description

Same function as bibit with an additional new noise parameter which allows 0's in the discovered biclusters (See Details for more info).

Usage

bibit2(matrix = NULL, minr = 2, minc = 2, noise = 0,
  arff_row_col = NULL, output_path = NULL, extend_columns = "none",
  extend_mincol = 1, extend_limitcol = 1, extend_noise = noise,
  extend_contained = FALSE)

Arguments

matrix

The binary input matrix.

minr

The minimum number of rows of the Biclusters.

minc

The minimum number of columns of the Biclusters.

noise

Noise parameter which determines the amount of zero's allowed in the bicluster (i.e. in the extra added rows to the starting row pair).

  • noise=0: No noise allowed. This gives the same result as using the bibit function. (default)

  • 0<noise<1: The noise parameter will be a noise percentage. The number of allowed 0's in a (extra) row in the bicluster will depend on the column size of the bicluster. More specifically zeros_allowed = ceiling(noise * columnsize). For example for noise=0.10 and a bicluster column size of 5, the number of allowed 0's would be 1.

  • noise>=1: The noise parameter will be the number of allowed 0's in a (extra) row in the bicluster independent from the column size of the bicluster. In this noise option, the noise parameter should be an integer.

arff_row_col

If you want to circumvent the internal R function to convert the matrix to .arff format, provide the pathname of this file here. Additionally, two .csv files should be provided containing 1 column of row and column names. These two files should not contain a header or quotes around the names, simply 1 column with the names.
(Example: arff_row_col=c("...\\data\\matrix.arff","...\\data\\rownames.csv","...\\data\\colnames.csv"))
Note: These files can be generated with the make_arff_row_col function.
Warning: Should you use the write.arff function from the foreign package, remember to transpose the matrix first.

output_path

If as output, the original txt output of the Java code is desired, provide the outputh path here (without extension). In this case the bibit function will skip the transformation to a Biclust class object and simply return NULL.
(Example: output_path="...\\out\\bibitresult")
(Description Output: The following information about every bicluster generated will be printed in the output file: number of rows, number of columns, name of rows and name of columns.

extend_columns

Column Extension Parameter
Can be one of the following: "none", "naive", "recursive" which will apply either a naive or recursive column extension procedure. (See Details Section for more information.)
Based on the extension, additional biclusters will be created in the Biclust object which can be seen in the column and row names of the RowxNumber and NumberxCol slots ("_Ext" suffix).
The info slot will also contain some additional information. Inside this slot, BC.Extended contains info on which original biclusters were extended, how many columns were added, and in how many extra extended biclusters this resulted.

Warning: Using a percentage-based extend_noise (or noise by default) in combination with the recursive procedure will result in a large amount of biclusters and increase the computation time a lot. Depending on the data when using recursive in combination with a noise percentage, it is advised to keep it reasonable small (e.g. 10%). Another remedy is to sufficiently increase the extend_limitcol either as a percentage or integer to limit the candidates of columns.

extend_mincol

Column Extension Parameter
A minimum number of columns that a bicluster should be able to be extended with before saving the result. (Default=1)

extend_limitcol

Column Extension Parameter
The number (extend_limitcol>=1) or percentage (0<extend_limitcol<1) of 1's that a column (subsetted on the BC rows) should at least contain for it to be a candidate to be added to the bicluster as an extension. (Default=1) (Increase this parameter if the recursive extension takes too long. Limiting the pool of candidates will decrease computation time, but restrict the results more.)

extend_noise

Column Extension Parameter
The maximum allowed noise (in each row) when extending the columns of the bicluster. Can take the same as the noise parameter. By default this is the same value as noise.

extend_contained

Column Extension Parameter
Logical value if extended results should be checked if they contain each other (and deleted if this is the case). Default = FALSE. This can be a lengthy procedure for a large amount of biclusters (>1000).

Value

A Biclust S4 Class object.

Details - General

bibit2 follows the same steps as described in the Details section of bibit.
Following the general steps of the BiBit algorithm, the allowance for noise in the biclusters is inserted in the original algorithm as such:

  1. Binary data is encoded in bit words.

  2. Take a pair of rows as your starting point.

  3. Find the maximal overlap of 1's between these two rows and save this as a pattern/motif. You now have a bicluster of 2 rows and N columns in which N is the number of 1's in the motif.

  4. Check all remaining rows if they match this motif, however allow a specific amount of 0's in this matching as defined by the noise parameter. Those rows that match completely or those within the allowed noise range are added to bicluster.

  5. Go back to Step 2 and repeat for all possible row pairs.

Note: Biclusters are only saved if they satisfy the minr and minc parameter settings and if the bicluster is not already contained completely within another bicluster.

What you will end up with are biclusters not only consisting out of 1's, but biclusters in which 2 rows (the starting pair) are all 1's and in which the other rows could contain 0's (= noise).

Note: Because of the extra checks involved in the noise allowance, using noise might increase the computation time a little bit.

Details - Column Extension

An optional procedure which can be applied after applying the BiBit algorithm (with noise) is called Column Extension. The procedure will add extra columns to a BiBit bicluster, keeping into account the allowed extend_noise level in each row. The primary goal is to, after applying BiBit with noise, to also try and add some noise to the 2 initial 'perfect' rows. Other parameters like extend_mincol and extend_limitcol can also further restrict which extensions should be discovered.
This procedure can be done either naively (fast) or recursively (more slow and thorough) with the extend_columns parameter.

"naive"

Subsetting on the bicluster rows, the column candidates are ordered based on the most 1's in a column. Afterwards, in this order, each column is sequentially checked and added when the resulted BC is still within row noise levels.
This has 2 major consequences:

  • If 2 columns are identical, the first in the dataset is added, while the second isn't (depending on the noise level allowed per row).

  • If 2 non-identical columns are viable to be added (correct row noise), the column with the most 1's is added. Afterwards the second column might not be viable anymore.

Note that using this method will always result in a maximum of 1 extended bicluster per original bicluster.

"recursive"

Conditioning the group of candidates for the allowed row noise level, each possible/allowed combination of adding columns to the bicluster is checked. Only the resulted biclusters with the highest number of extra columns are saved. Of course this could result in multiple extensions for 1 bicluster if there are multiple 'maximum added columns' results.

Note: These procedures are followed by a fast check if the extensions resulted in any duplicate biclusters. If so, these are deleted from the final result.

Author(s)

Ewoud De Troyer

References

Domingo S. Rodriguez-Baena, Antonia J. Perez-Pulido and Jesus S. Aguilar-Ruiz (2011), "A biclustering algorithm for extracting bit-patterns from binary datasets", Bioinformatics

Examples

## Not run: 
data <- matrix(sample(c(0,1),100*100,replace=TRUE,prob=c(0.9,0.1)),nrow=100,ncol=100)
data[1:10,1:10] <- 1 # BC1
data[11:20,11:20] <- 1 # BC2
data[21:30,21:30] <- 1 # BC3
data <- data[sample(1:nrow(data),nrow(data)),sample(1:ncol(data),ncol(data))]

result1 <- bibit2(data,minr=5,minc=5,noise=0.2)
result1
MaxBC(result1,top=1)

result2 <- bibit2(data,minr=5,minc=5,noise=3)
result2
MaxBC(result2,top=2)

## End(Not run)

[Package BiBitR version 0.3.1 Index]