Downloading the entire database or associated scripts

Current and previous releases of the PhyLoTA Browser database can be downloaded in a format suitable for rebuilding it locally in a mysql database (or the equivalent). The database is exported as a set of mysql commands using the 'mysqldump' utility. The file is named in the following format:

pb.bu.rel###.date.gz

where rel### is the GenBank release upon which the database was built, and date is the date the database was exported. This very large file is then broken into several 250 MB pieces with file names as above, but with the suffix, 'partxx' replacing the .gz. These can be downloaded separately. To join them again under Unix/Linux the command would be (for example):

cat pb.bu.rel168.8.18.2009.part* > pb.bu.rel168.8.18.2009.gz

Once the file is uncompressed it can be imported into mysql directly. The files are plain text, so they can also be parsed by other software easily.

Go to the download directory


Notes on the database schema, and scripts used to build the database and run the browser

The database schema

We use a relational schema implemented in MySQL with a small number of tables. The 'seqs' table is used across all releases of the database. Other tables have a suffix consisting of '_xx' indicating the GenBank release number.

The 'nodes_xx' table is constructed in part from NCBI's taxonomy flatfiles and in part from calculations and summaries built by us. The 'seqs' table is data taken directly from GenBank sequence flatfiles. The 'clusters_xx' contains summary information obtained by the clustering pipeline, and information about individual clusters is stored in 'cigi_xx'. Summary statistics on the entire cluster set are calculated and stored in 'summary_stats'.

Source code for the scripts that implement the clustering pipeline is provided below. Note, however, that it relies on additional software installations including BLAST (available from NCBI), and our PhyLoTA project software programs blink, and blast2blink, which can be downloaded from the PhyLoTA project software page.

BLAST search scripts are distributable on cluster architecturs and are written as Sun Grid Engine compliant code (in PERL). However, it is straightforward to modify these scripts to run on single workstations.

The browser

The web interface seen by the user is implemented as a set of Perl CGI scripts. These write dynamic web pages in response to user queries of various kinds. They access the MySQL database via the Perl DBI module. A configuration file is included in the source code that can be modified for local MySQL login parameters.


Perl scripts for the clustering pipeline (gzip compressed)
Perl CGI scripts for the browser interface (gzip compressed)