Procedures Manual

 

Mirroring the CABRI site, Technical Annex

Introduction

This technical annex only includes a description of the procedures that are needed in order to mirror the main CABRI site. Interested people will receive more particular reference (such as real addresses and directories involved and parameters to be defined) only when the CABRI Technical Committee will have given its agreement to the setting up of the mirror.

Software needed for the mirroring

The mirror must be set up and updated by using RSYNC software, that was written by A. Tridgell and P. Mackerras and is available under the Gnu Public License. RSYNC uses an original algorithm which provides a very fast method for bringing remote files into sync. It does this by sending just the differences in the files across the link.

Although RSYNC could work alone (by running as a daemon on the main site), access to the main site is only allowed currently through the SSH secure shell; RSYNC is then accessing the main site via SSH. SSH Version 1.2.26 is ok, but any version supporting protocol version 1.5 should be ok too. Of course, a login account for each mirror site has to be set up on the main site.

Software needed for running the mirror

A Web server able to manage the virtual host functionality or exclusively devoted to the mirror. Currently, the APACHE server is preferred, but the choice is left to the mirror manager, as well as related licence issues. Some parameters must be defined at run time in the configuration files (see below).

SRS (Sequence Retrieval Software) from Lion Bioscience, Cambridge UK. The SRS version that is adopted by CABRI can be retrieved through the mirroring procedure, but has to be compiled (C compiler is needed) locally on the mirror. Version 5.1.0 is currently used. Upgrade to SRS 6 is not planned.

The Perl interpreter is needed for both SRS and some CABRI own scripts, such as the trolley. Versions 5.005 or later are currently needed.

Mirror setting up procedure

If not already available, the Web server, Perl, SSH and RSYNC must be retrieved and installed.

The server must be set up as a virtual host for the assigned address within the cabri.org domain.

A personal pair of secure keys for SSH must be generated and the public key must be uploaded to the main site, under the mirror account.

RSYNC must be launched a first time against the main site in order to retrieve all the needed files: HTML pages and related gifs, data and indexes for catalogues, SRS search engine.

SRS must then be installed by compiling a few files and defining all the local parameters. The data and indexes are already adequate: no further indexing is needed.

After the installation, the configuration files of the web server must be modified to include aliases for accessing SRS, while some parameter files must be modified to enable the trolley cart locally.

Mirror maintainance procedures

In order to keep the mirror up-to-date, a procedure must be created and periodically launched (on unix machines, this is tipically done by the cron facility).

The procedure uses RSYNC in order to verify if the HTML pages and images, the data and indexes of the collections and the scripts for the trolley cart have been modified. It does not check changes within SRS distribution or locally modified configuration files.

The mirroring procedure should only be run at night time, thus avoiding possible uncoherent data and indexes that could be due to mirroring while the main site is re-indexing some catalogues.

Mirror size

The home page and related files are quite small and only occupy ca 0,5 Mbytes.

SRS occupies ca 25 Mbytes, while the data directories occupy more than 63 Mbytes and the indexes are 35 Mbytes. SRS requires the setting up of temporary directory, where all query sessions are recorded: it has a variable (and normally growing) size.

So, a mirror site size starts from 130 Mbytes.

______________________________________________

Guidelines prepared for CABRI by INRC
Page Layout by CERDIC
Copyright CABRI, 1999