ROBIN Help Page


Introduction

ROBIN is a web server for analyzing rearrangements of landmark orders between two chromosomal genomes using the block-interchange events, where the landmarks considered here can be genes. A block-interchange event, a generalization of a transposition, affects on a chromosome by swapping two non-intersecting intervals of landmarks of any length, where the swapped intervals are not necessarily adjacent in chromosome. ROBIN takes two or more linear/circular chromosomes as its input, and computes the number of minimum block-interchange rearrangements between any two input chromosomes for transforming one chromosome into another and also determines an optimal scenario taking this number of rearrangements. The input of ROBIN can be either bacterial-size sequence data or landmark-order data. If the input is sequence data, ROBIN will automatically search for the identical landmarks that are the homologous/conserved regions shared by all the input sequences. In our ROBIN system, we adopt the so-called LCBs (Locally Collinear Blocks) for representing the landmarks in genomes. The LCB is defined as a collinear (consistent) set of the multi-MUMs and its weight is defined as the sum of the lengths of the contained multi-MUMs, where the multi-MUMs are the exactly matching subsequences shared by all the considered genomes that occur only once in each genome and that are bounded on either side by mismatched nucleotides. The weight of an LCB can serve as a measure of confidence that it is a true homologous region rather than a random match. By selecting a high minimum weight, ROBIN can identify the larger LCBs that are truly involved in genome rearrangement, whereas by selecting a lower minimum weight, ROBIN can trade some specificity for sensitivity to identify the smaller LCBs that are possibly involved in genome rearrangement. For the details about LCB, we refer the user to the paper by Darling et al. (Darling et al., 2004).

Usage of ROBIN:

Input:

Users can choose to input sequence data or input landmark-order data to run ROBIN. If users choose sequence data as the input, they can copy and paste two or more sequences in FASTA format in the top field (1) of the web interface, or simply upload a plain text file of sequences with FASTA format they prepared in advance (2). Next, users can specify a minimum multi-MUM length (3), whose default is set to log2n, and a minimum LCB weight (4), whose default is set to 3*(minimum multi-MUM length), so that all the multi-MUMs and LCBs identified by ROBIN meet these two minimum length and weight. Finally, users need to select the chromosome type (5) according to their chromosomal sequences being linear or circular before they submit their data to execute ROBIN. It should be noted that the sequence data users are going to submit can be processed in an immediate way (the default) or in a batch way, which is suitable for submitting the large-scale sequences. For batch processing, users need to select the check box in front of "Enter your email address" and simultaneously input the email address in the blank field (6). (Please make sure that the email address is correct.) In this case, users will be notified of the output via email at a later time when the job is finished. If users chooses landmark-order data as the input, they can copy and paste two or more unsigned integer sequences in FASTA-like format in the bottom field (7) of the web interface, where each unsigned integer represents an identical landmark on all input chromosomes. After selecting the chromosome type (8), users can submit their data to run ROBIN.
Web Interface of ROBIN

Output:

If your input is sequence data, ROBIN will first output the order of the computed common LCBs that shared by all the input sequences and then output the computed block-interchange distance matrix. In each of the computed LCB orders, you can see some detailed information, such as the position (denoted by left and right end coordinates), length and weight of each LCB and the overall coverage of all LCBs on the genome, by clicking the link associated with it. Notice that an LCB whose left and right end coordinates both are negative values indicates that this LCB is the inverted region on the opposite strand of the given sequence. In addition, you can see an optimal scenario of block-interchanges for any two input sequences by clicking the link associated with their computed distance in the block-interchange distance matrix. Here is an example of the output we obtained by inputing the chromosome II sequences of three vibrio species, e.g., V. cholerae, V. parahaemolyticus and V. vulnificus. If your input is landmark-order data, ROBIN will output the computed block-interchange distance matrix and an optimal scenario of block-interchanges for any two input landmark orders. Here is such an example.

CPU Time Usage of ROBIN

Currently, the ROBIN system is installed on IBM PC with 1.26 GHz processor and 512 MB RAM under Linux system. On such hard environment, our ROBIN can deal with only about 4,800 landmarks if the input is landmark-order data. The following table shows the running time of ROBIN for processing two landmark orders when the number of landmarks is increased.

The Running Time of ROBIN for Processing Two Landmark Orders with Different Size
Number of Landmarks CPU Time Usage
100 < 1 sec
1,000 55 sec
2,000 3.7 min
3,000 9 min
4,000 28 min
4,800 35 min

The next table shows the running time of ROBIN for processing multiple landmark orders with 1,000 landmarks when the number of landmark orders is increased

The Running Time of ROBIN for Processing Multiple Landmark Orders of 1,000 Landmarks
Number of Landmark Orders CPU Time Usage
2 51 sec
3 144 sec
4 289 sec
5 472 sec

If the input is sequence data, the limitation of our ROBIN greatly depends on the length scale and number of input sequences, because it needs additional time for computing all the LCBs shared by all input sequences. Currently, it can handle the sequences with total length of up to 35 Mbp. The table below lists the running time of ROBIN for processing two sequences when the average length of sequences is increased.

The Running Time of ROBIN for Processing Two Sequences with Different Length
Average Sequence Length (Mbp) CPU Time Usage
1.9 15 min
3.4 18 min
5 46 min
8 56 min

The following table shows the running time of ROBIN for processing sequence data of length ranging from 4.6 Mbp to 5.5 Mbp when the number of sequences is increased

The Running Time of ROBIN for Processing Multiple Sequences
Number of Sequences CPU Time Usage
2 24 min
3 53 min
4 81 min
5 157 min
6 240 min
7 290 min

Experimental Result

To test our ROBIN system, we rerun the experiments conducted by Lin et al., (2005) for detecting the evolutionary relationships among three human vibrio pathogens, including V. vulnificus, V. parahaemolyticus and V. cholerae. It is reported that V. vulnificus is an etiologic agent for severe human infection acquired through wounds or contaminated seafood and shares morphological and biochemical characteristics with other human vibrio pathogens, including V. cholerae and V. parahaemolyticus (Chen et al., 2003). The genomes of these three vibrio species consist of two circular chromosomes, and their genomic sequences have been uncovered recently. See the following Table for their sequence information.

The Sequence Information of Three Pathogenic Vibrio Species, Each with Two Circular Chromosomes
Accession NO Species Chromosome Size (Mbp)
NC_005139 V. vulnificus YJ016 1 (VV1) 3.4
NC_005140 V. vulnificus YJ016 2 (VV2) 1.9
NC_004603 V. parahaemolyticus RIMD 2210633 1 (VP1) 3.3
NC_004605 V. parahaemolyticus RIMD 2210633 2 (VP2) 1.9
NC_002505 V. cholerae El Tor N16961 1 (VC1) 3.0
NC_002506 V. cholerae El Tor N16961 2 (VC2) 1.0

As more and more sequence information of vibrio species becomes available, a comparative genomics approach is needed to uncover the critical events leading to the functional uniqueness of vibrio species. To address the issue of how vibrio species evolved, Chen et al. (2003) conducted a chromosome-by-chromosome analysis of the V. vulnificus YJ016 sequence along with the V. cholerae El Tor N16961 sequence and the V. parahaemolyticus RIMD 2210633 sequence to compare relative positions of conserved genes and to investigate the movement of genetic materials within and between the two chromosomes in the vibrio species. Their comparative analysis revealed that V. vulnificus showed a higher degree of conservation in gene organization in the two chromosomes relative to V. parahaemolyticus than to V. cholerae, which implies that V. vulnificus is closer to V. parahaemolyticus than to V. cholerae from the evolutionary viewpoint. Chen et al. (2003) also conducted an analysis by comparing the number, distribution, and position of gene family members in the V. vulnificus and V. cholerae genomes. The results indicated that it appears that duplication and transposition events occurred more frequently in the V. vulnificus genome. Since the transposition is a special case of block-interchange, it seems to be reasonable to postulate that the rearrangement of block-interchange may play another significant role in the evolution of vibrio genomes. To justify this viewpoint, we conducted an experiment on these three human vibrio pathogens to see if their evolutionary relationships determined only based on their block-interchange distances with each other agree with those obtained by Chen et al. (2003).

In the previous experiments as we have done in (Lin et al., 2005), we used the common MUMs, which were computed in advance with another tool of finding consensuses or signatures, among these three vibrio genomes to represent the identical landmarks. However, in the experiments we have done here, we used the LCBs as the landmarks that were automatically computed by our ROBIN system with default parameters from three input vibrio genomic sequences. The experimental results we obtained are as follows.

  1. The experimental result of VV1, VP1 and VC1
  2. The experimental result of VV2, VP2 and VC2
Totally, ROBIN identified 95 (respectively, 20) common LCBs for VV1, VP1, and VC1 (respectively, VV2, VP2, and VC2). The computed block-interchange distance matrices are shown as follows.

The Block-Interchange Distances among VV1, VP1, and VC1
VC1 VP1 VV1
VC1 - 37 38
VP1 37 - 17
VV1 38 17 -
The Block-Interchange Distances among among VV2, VP2, and VC2
VC2 VP2 VV2
VC2 - 8 9
VP2 8 - 5
VV2 9 5 -

As shown in the tables above, the block-interchange distance between V. vulnificus and V. parahaemolyticus is smaller than that between V. vulnificus and V. cholerae and that between V. parahaemolyticus and V. cholerae in both circular chromosomes. These experimental results indeed coincide with those obtained by Lin et al. (2005) and by Chen et al. (2003).

FASTA Format Description

A sequence in FASTA format starts with a single-line description, followed by lines of sequence data. The description line starts with a right angle bracket (">") and is usually followed by the sequence identifiers and description. An example of a sequence in FASTA format is shown as follows.
>An example of a sequence in FASTA format
TGGAGTATTAACAGAAAATTGATACCAAACGAACAAAGTTAAGTATAAAAACCGCGTTTAAATAACCCAC
ATATTCTTCGATAAGGAGAAAACATTTTAAATATTACAGTGTCACTTATTTACAATGTAAAGCCACGTTT

FASTA-like Format Description

A gene/landmark order in FASTA-like format starts with a single-line description, followed by lines of unsigned integers, which are separated by space(s), with each integer representing a homologous gene or identical landmark on all input chromosomes. The description line starts with a right angle bracket (">") and is usually followed by the chromosome identifiers and description. An example of a gene/landmark order in FASTA format is shown as follows.
>Vibrio vulnificus chromosome II
1 19 2 20 15 3 17 16 9 18 13 7 5 12 6 8 10 4 11 14

Examples

Some examples of sequence data or gene-/landmark-order data for testing ROBIN are collected as follows.

Contact information

References

  1. C. L. Lu, T. C. Wang, Y. C. Lin and C. Y. Tang (2005), ROBIN: A Tool for Genome Rearrangement of Block-Interchanges, Bioinformatics, In press.
  2. Y. C. Lin, C. L. Lu, H. Y. Chang and C. Y. Tang (2005), An Efficient Algorithm for Sorting by Block-Interchanges and Its Application to the Evolution of Vibrio Species, Journal of Computational Biology, Vol. 12, pp. 102-112.
  3. A. C. E. Darling, B. Mau, F. R. Blattner, and N. T. Perna (2004), Mauve: Multiple Alignment of Conserved Genomic Sequence With Rearrangement, Genome Research, Vol. 14, pp. 1394-1403.
  4. C. Y. Chen, K. M. Wu, Y. C. Chang and C. H. Chang (2003), Comparative Genome Analysis of Vibrio Vulnificus, a Marine Pathogen, Genome Research, Vol. 13, pp. 2577-2587.