ROBIN Help Page
Introduction
ROBIN is a web server for analyzing rearrangements of landmark
orders between two chromosomal genomes using the block-interchange events,
where the landmarks considered here can be genes.
A block-interchange event, a generalization of a transposition,
affects on a chromosome by swapping two non-intersecting intervals
of landmarks of any length,
where the swapped intervals are not necessarily adjacent in chromosome.
ROBIN takes two or more linear/circular chromosomes as its input,
and computes the number of minimum block-interchange rearrangements
between any two input chromosomes for transforming
one chromosome into another and also determines an optimal
scenario taking this number of rearrangements.
The input of ROBIN can be either bacterial-size sequence data
or landmark-order data.
If the input is sequence data, ROBIN will automatically search for the
identical landmarks that are the homologous/conserved regions shared by all the
input sequences.
In our ROBIN system, we adopt the so-called
LCBs (Locally Collinear Blocks) for representing
the landmarks in genomes.
The LCB is defined as a collinear (consistent) set of the multi-MUMs
and its weight is defined as the sum of the lengths of the contained multi-MUMs,
where the multi-MUMs are the exactly matching subsequences shared by
all the considered genomes that occur only once in each
genome and that are bounded on either side by mismatched nucleotides.
The weight of an LCB can serve as
a measure of confidence that it is a true homologous region rather than
a random match.
By selecting a high minimum weight, ROBIN can identify the larger LCBs
that are truly involved in genome rearrangement, whereas by selecting a
lower minimum weight, ROBIN can trade some specificity for sensitivity to
identify the smaller LCBs that are possibly involved in genome rearrangement.
For the details about LCB, we refer the user to the paper by Darling et al.
(Darling et al., 2004).
Usage of ROBIN:
Input:
Users can choose to input sequence data or
input landmark-order data to run ROBIN.
If users choose sequence data as the input,
they can copy and paste two or more sequences in FASTA format
in the top field (1) of the web interface,
or simply upload a plain text file of sequences with FASTA format
they prepared in advance
(2).
Next, users can specify a minimum multi-MUM length (3),
whose default is set to log2n,
and a minimum LCB weight (4),
whose default is set to 3*(minimum multi-MUM length),
so that all the multi-MUMs and LCBs identified by ROBIN meet these two
minimum length and weight.
Finally, users need to select the chromosome type (5)
according to their chromosomal sequences being linear or circular
before they submit their data to execute ROBIN.
It should be noted that the sequence data users are going to
submit can be processed in an immediate way (the default)
or in a batch way, which is suitable for submitting the large-scale sequences.
For batch processing,
users need to select the check box in front of
"Enter your email address"
and simultaneously input the email address in the blank
field (6).
(Please make sure that the email address is correct.)
In this case, users will be notified of the output via email
at a later time when the job is finished.
If users chooses landmark-order data as the input,
they can copy and paste two or more unsigned integer sequences in
FASTA-like format in the bottom
field (7) of the web interface,
where each unsigned integer represents an identical landmark on
all input chromosomes.
After selecting the chromosome type (8), users can
submit their data to run ROBIN.
Output:
If your input is sequence data, ROBIN will first output
the order of the computed common LCBs that shared by all the input sequences
and then output the computed block-interchange distance matrix.
In each of the computed LCB orders,
you can see some detailed information, such as
the position (denoted by left and right end coordinates),
length and weight of each LCB and
the overall coverage of all LCBs on the genome,
by clicking the link associated with it.
Notice that an LCB whose left and right end coordinates both are negative values
indicates that this LCB is the inverted region on the opposite strand of
the given sequence.
In addition,
you can see an optimal scenario of block-interchanges for any two input sequences
by clicking the link associated with their computed distance
in the block-interchange distance matrix.
Here
is an example of the output we obtained by inputing the chromosome II sequences of
three vibrio species, e.g., V. cholerae, V. parahaemolyticus and
V. vulnificus.
If your input is landmark-order data, ROBIN will output
the computed block-interchange distance matrix and
an optimal scenario of block-interchanges for any two input landmark orders.
Here
is such an example.
CPU Time Usage of ROBIN
Currently, the ROBIN system is installed
on IBM PC with 1.26 GHz processor and 512 MB RAM under Linux system.
On such hard environment,
our ROBIN can deal with only about 4,800 landmarks
if the input is landmark-order data.
The following table shows the running time of ROBIN for
processing two landmark orders when
the number of landmarks is increased.
The Running Time of ROBIN for Processing Two
Landmark Orders with Different Size
| Number of Landmarks |
CPU Time Usage |
| 100 |
< 1 sec |
| 1,000 |
55 sec |
| 2,000 |
3.7 min |
| 3,000 |
9 min |
| 4,000 |
28 min |
| 4,800 |
35 min |
The next table shows the running time of ROBIN for processing
multiple landmark orders with 1,000 landmarks
when the number of landmark orders is increased
The Running Time of ROBIN for Processing Multiple
Landmark Orders of 1,000 Landmarks
| Number of Landmark Orders |
CPU Time Usage |
| 2 |
51 sec |
| 3 |
144 sec |
| 4 |
289 sec |
| 5 |
472 sec |
If the input is sequence data, the limitation of our ROBIN greatly
depends on the length scale and number of input sequences,
because it needs additional time for computing
all the LCBs shared by all input sequences.
Currently, it can handle the sequences with total length
of up to 35 Mbp.
The table below lists the running time of ROBIN for processing
two sequences when the average length of sequences is increased.
The Running Time of ROBIN for Processing Two
Sequences with Different Length
| Average Sequence Length (Mbp) |
CPU Time Usage |
| 1.9 |
15 min |
| 3.4 |
18 min |
| 5 |
46 min |
| 8 |
56 min |
The following table shows the running time of ROBIN for processing
sequence data of length ranging from 4.6 Mbp to 5.5 Mbp
when the number of sequences is increased
The Running Time of ROBIN for Processing Multiple
Sequences
| Number of Sequences |
CPU Time Usage |
| 2 |
24 min |
| 3 |
53 min |
| 4 |
81 min |
| 5 |
157 min |
| 6 |
240 min |
| 7 |
290 min |
Experimental Result
To test our ROBIN system, we rerun the experiments conducted by
Lin et al., (2005) for detecting the evolutionary
relationships among three human vibrio pathogens, including
V. vulnificus, V. parahaemolyticus and V. cholerae.
It is reported that V. vulnificus is an etiologic agent for severe
human infection acquired through wounds or contaminated seafood and shares
morphological and biochemical characteristics with other human vibrio pathogens,
including V. cholerae and V. parahaemolyticus
(Chen et al., 2003).
The genomes of these three vibrio species consist of two
circular chromosomes, and their genomic sequences have been uncovered recently.
See the following Table for their sequence information.
The Sequence Information of Three Pathogenic Vibrio Species,
Each with Two Circular Chromosomes
| Accession NO |
Species |
Chromosome |
Size (Mbp) |
| NC_005139 |
V. vulnificus YJ016 |
1 (VV1) |
3.4 |
| NC_005140 |
V. vulnificus YJ016 |
2 (VV2) |
1.9 |
| NC_004603 |
V. parahaemolyticus RIMD 2210633 |
1 (VP1) |
3.3 |
| NC_004605 |
V. parahaemolyticus RIMD 2210633 |
2 (VP2) |
1.9 |
| NC_002505 |
V. cholerae El Tor N16961 |
1 (VC1) |
3.0 |
| NC_002506 |
V. cholerae El Tor N16961 |
2 (VC2) |
1.0 |
As more and more sequence information of vibrio species becomes available,
a comparative genomics approach is needed to uncover the critical events
leading to the functional uniqueness of vibrio species.
To address the issue of how vibrio species evolved, Chen et al. (2003)
conducted a chromosome-by-chromosome analysis of the
V. vulnificus YJ016 sequence along with the V. cholerae El Tor N16961
sequence and the V. parahaemolyticus RIMD 2210633 sequence to
compare relative positions of conserved genes and to investigate
the movement of genetic materials within and between the two chromosomes
in the vibrio species.
Their comparative analysis revealed that V. vulnificus showed a higher
degree of conservation in gene organization in the two chromosomes relative
to V. parahaemolyticus than to V. cholerae, which implies that
V. vulnificus is closer to V. parahaemolyticus than to
V. cholerae from the evolutionary viewpoint.
Chen et al. (2003) also conducted an analysis by comparing the number,
distribution, and position of gene family members in the V. vulnificus
and V. cholerae genomes.
The results indicated that it appears that duplication and transposition events
occurred more frequently in the V. vulnificus genome.
Since the transposition is a special case of block-interchange, it seems to be
reasonable to postulate that the rearrangement of block-interchange may play
another significant role in the evolution of vibrio genomes.
To justify this viewpoint, we conducted an experiment on these three human
vibrio pathogens to see if their evolutionary relationships determined only
based on their block-interchange distances with each other agree with those
obtained by Chen et al. (2003).
In the previous experiments as we have done in (Lin et al., 2005),
we used the common MUMs, which were computed
in advance with another tool of finding consensuses or signatures,
among these three vibrio genomes to represent the identical landmarks.
However, in the experiments we have done here,
we used the LCBs as the landmarks that were
automatically computed by our ROBIN system with default parameters from
three input vibrio genomic sequences.
The experimental results we obtained are as follows.
-
The experimental result of VV1, VP1 and VC1
-
The experimental result of VV2, VP2 and VC2
Totally, ROBIN identified 95 (respectively, 20) common LCBs
for VV1, VP1, and VC1 (respectively, VV2, VP2, and VC2).
The computed block-interchange distance matrices are shown as follows.
The Block-Interchange Distances among VV1, VP1, and VC1
| |
VC1 |
VP1 |
VV1 |
| VC1 |
- |
37 |
38 |
| VP1 |
37 |
- |
17 |
| VV1 |
38 |
17 |
- |
The Block-Interchange Distances among among VV2, VP2, and VC2
| |
VC2 |
VP2 |
VV2 |
| VC2 |
- |
8 |
9 |
| VP2 |
8 |
- |
5 |
| VV2 |
9 |
5 |
- |
As shown in the tables above, the block-interchange distance between
V. vulnificus and V. parahaemolyticus is smaller than that
between V. vulnificus and V. cholerae and that
between V. parahaemolyticus and V. cholerae
in both circular chromosomes.
These experimental results indeed coincide with those obtained by
Lin et al. (2005) and by Chen et al. (2003).
A sequence in FASTA format starts with a single-line description, followed
by lines of sequence data. The description line starts with a right angle
bracket (">") and is usually followed by the sequence identifiers and
description.
An example of a sequence in FASTA format is shown as follows.
>An example of a sequence in FASTA format
TGGAGTATTAACAGAAAATTGATACCAAACGAACAAAGTTAAGTATAAAAACCGCGTTTAAATAACCCAC
ATATTCTTCGATAAGGAGAAAACATTTTAAATATTACAGTGTCACTTATTTACAATGTAAAGCCACGTTT
A gene/landmark order in FASTA-like format starts with a single-line description,
followed by lines of unsigned integers, which are separated by space(s),
with each integer representing a homologous gene or identical landmark on
all input chromosomes.
The description line starts with a right angle bracket (">") and is usually
followed by the chromosome identifiers and description.
An example of a gene/landmark order in FASTA format is shown as follows.
>Vibrio vulnificus chromosome II
1 19 2 20 15 3 17 16 9 18 13 7 5 12 6 8 10 4 11 14
Examples
Some examples of sequence data or gene-/landmark-order data for
testing ROBIN are collected as follows.
-
Sequence Data
-
Landmark-Order Data
- The landmark orders of three vibrio species (chromosome I)
>Vibrio cholerae chromosome I
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
89 90 91 92 93 94 95
>Vibrio parahaemolyticus chromosome I
1 95 6 11 3 4 94 8 89 90 84 20 80 22 78 24 76 26 74
28 71 72 29 68 63 60 37 56 38 59 58 61 36 54 57 40
41 44 48 50 53 52 49 47 51 46 45 42 39 55 62 65 32
35 34 31 66 64 30 69 33 67 70 73 27 75 25 77 23 79
21 81 83 82 19 87 91 18 86 85 17 43 16 15 88 92 12
10 9 14 5 13 93 7 2
>Vibrio vulnificus chromosome I
1 95 6 11 3 5 4 94 8 84 83 20 80 22 78 24 76 26 74
28 71 72 29 68 63 64 66 31 34 39 42 45 32 65 62 61
36 54 57 40 48 47 49 51 52 53 41 44 50 46 35 55 58
59 38 56 37 60 10 30 69 33 67 70 73 27 75 25 77 23
79 21 81 82 19 87 91 90 89 18 86 85 17 43 16 15 88
92 12 9 14 13 93 7 2
- The landmark orders of three vibrio species (chromosome II)
>Vibrio cholerae chromosome II
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
>Vibrio parahaemolyticus chromosome II
1 19 9 18 16 14 13 7 5 12 6 8 4 10 11 20 17 3 15 2
>Vibrio vulnificus chromosome II
1 19 2 20 15 3 17 16 9 18 13 7 5 12 6 8 10 4 11 14
Contact information
-
To whom correspondence should be addressed:
Dr. Chin Lung Lu
(Email address: cllu@mail.nctu.edu.tw)
-
The ROBIN Software is developed by Tsui Ching Wang
(Email address: jingjing.bi92g@nctu.edu.tw)
References
- C. L. Lu, T. C. Wang, Y. C. Lin and C. Y. Tang (2005), ROBIN: A Tool for
Genome Rearrangement of Block-Interchanges, Bioinformatics, In press.
-
Y. C. Lin, C. L. Lu, H. Y. Chang and C. Y. Tang (2005), An
Efficient Algorithm for Sorting by Block-Interchanges and Its Application to
the Evolution of Vibrio Species,
Journal of Computational Biology, Vol. 12, pp. 102-112.
-
A. C. E. Darling, B. Mau, F. R. Blattner, and N. T. Perna (2004),
Mauve: Multiple Alignment of Conserved Genomic
Sequence With Rearrangement, Genome Research, Vol. 14, pp. 1394-1403.
-
C. Y. Chen, K. M. Wu, Y. C. Chang and C. H. Chang (2003),
Comparative
Genome Analysis of Vibrio Vulnificus, a Marine Pathogen,
Genome Research, Vol. 13, pp. 2577-2587.