Parallel Biclustering is an
embarrasingly parallel method for finding coherent evolution
biclusters. After an initial paper on the
Parallel Biclustering Algorithm
it was extended to form a complete system to find all biclusters.
This system is called the
Binary Indexed Chaining Parallel
Biclustering System.
Binary Indexed Chaining Parallel Biclustering System is an extension of
the Parallel Biclustering Algorithm developed by Tewfik et al. to find
all biclusters within a gene expression matrix. The purpose
of BICPBS is to extend PBA to data sets of 100s of genes by 100s of
conditions that were previously too large to bicluster.
BICPBS shows marked improvements in execution time and time
complexity. These improvements were brought about by porting
the algorithm to Python (from MATLAB) and restructuring the how larger
biclusters were constructed. By constructing larger
biclusters by combining small biclusters, expensive calculation could
be reused or avoided entirely.
BICPBS offers a complete system for finding all biclusters and
analyzing their size. BICPBS saves results to disk, so it is
not limited to the available physical RAM. Results can be
reloaded easily for later analysis to continue analysis on work that
was suspended. Results can be analyzed for the distribution
of the number of genes and the largest and smallest bicluster of a
given width.
If you'd like to learn more you can read the paper:
- OpenDocument:
- Binary
Indexed Chaining Parallel Biclustering System.odt
- PDF:
- Binary
Indexed Chaining Parallel Biclustering System.pdf
Or you can jump right in and download the source:
- 0.2.1
-
- tar.bz2:
- BICPBS-0.2.1.tar.bz2 (2.1 MB)
- tar.gz:
- BICPBS-0.2.1.tar.gz (2.9 MB)
- zip:
- BICPBS-0.2.1.zip (2.9 MB)
- 0.2
-
- tar.bz2:
- BICPBS-0.2.tar.bz2 (4.5 MB)
- tar.gz:
- BICPBS-0.2.tar.gz (7.4 MB)
- zip:
- BICPBS-0.2.zip (7.4 MB)