Parallel Biclustering

Parallel Biclustering is an embarrasingly parallel method for finding coherent evolution biclusters.  After an initial paper on the Parallel Biclustering Algorithm it was extended to form a complete system to find all biclusters.  This system is called the Binary Indexed Chaining Parallel Biclustering System.


Binary Indexed Chaining Parallel Biclustering System is an extension of the Parallel Biclustering Algorithm developed by Tewfik et al. to find all biclusters within a gene expression matrix.  The purpose of BICPBS is to extend PBA to data sets of 100s of genes by 100s of conditions that were previously too large to bicluster.  BICPBS shows marked improvements in execution time and time complexity.  These improvements were brought about by porting the algorithm to Python (from MATLAB) and restructuring the how larger biclusters were constructed.  By constructing larger biclusters by combining small biclusters, expensive calculation could be reused or avoided entirely.

BICPBS offers a complete system for finding all biclusters and analyzing their size.  BICPBS saves results to disk, so it is not limited to the available physical RAM.  Results can be reloaded easily for later analysis to continue analysis on work that was suspended.  Results can be analyzed for the distribution of the number of genes and the largest and smallest bicluster of a given width.

If you'd like to learn more you can read the paper:

OpenDocument:
Binary Indexed Chaining Parallel Biclustering System.odt
PDF:
Binary Indexed Chaining Parallel Biclustering System.pdf
Or you can jump right in and download the source:
0.2.1
tar.bz2:
BICPBS-0.2.1.tar.bz2 (2.1 MB)
tar.gz:
BICPBS-0.2.1.tar.gz (2.9 MB)
zip:
BICPBS-0.2.1.zip (2.9 MB)
0.2
tar.bz2:
BICPBS-0.2.tar.bz2 (4.5 MB)
tar.gz:
BICPBS-0.2.tar.gz (7.4 MB)
zip:
BICPBS-0.2.zip (7.4 MB)

Document made with Nvu