MuSiC Help Page
Introduction:
MuSiC is a web server to perform the constrained alignment of a set of
sequences, such that the user-specified residues/nucleotides are
aligned with each other.
The input of the MuSiC system consists of a set of
protein/DNA/RNA sequences and a set of user-specified constraints,
each with a fragment of residue/nucleotide that (approximately)
appears in all input sequences.
The output of MuSiC is a constrained multiple sequence alignment
in which the fragments of the input sequences
whose residues/nucleotides exhibit
a given degree of similarity to a constraint
are aligned together.
The current MuSiC system is implemented in Java language and
can be accessed via a simple web interface.
Usage of MuSiC:
Input:
To run MuSiC, users first input their protein/DNA/RNA sequences
in the FASTA format in the top blank field
(1).
They then must select a suitable scoring matrix
from a list of predefined matrices (2).
The proposed MuSiC system penalizes the gaps using the affine gap penalty
function so users must also key in two real values for
the gap open penalty (3)
and the gap extension penalty (4).
Alternatively, the user may simply choose "Protein" or "DNA/RNA"
(5) to adopt the default values provided
by the system, where "Protein" and "DNA/RNA" indicate that
the sequences considered by the user are protein and
DNA/RNA sequences, respectively.
Finally, in the "Constraints" field (6),
users can input constrained sequences,
delimited by non-alphabetic characters, such as
commas, slashes, verticals and pluses; they may also select
the "Approximate" option and input a
real number 0 <= R < 1 in
the "Ratio" field (7),
which implies that no more than (l(Pi) * R)
mismatches exist between the constrained sequence and
the aligned fragments of input sequences,
where l(Pi) denotes the length of a
constrained sequence Pi.
Notably, the "Constraints" field need not be filled.
If this field is left blank, then the MuSiC system will output
an ordinary MSA without constraints.
Output:
In the resulting multiple sequence alignment, the constrained columns
are colored in black and their corresponding constraints are also displayed
beneath them.
The colors of the characters in the non-constrained columns of the
resulting multiple sequence alignment
are followed the rules of Clustal X, which are defined as follows.
In the case of protein sequences, the default colors are as follows:
Color Residue Code
-----------------------------------------
ORANGE GPST
RED HKR
BLUE FWY
GREEN ILMV
In the case of DNA/RNA sequences, the default colors are as follows:
Color Base Code
--------------------------------------
ORANGE A
RED C
BLUE T
GREEN G
Scoring Matrices:
The currently predefined scoring matrices of MuSiC system
consist of BLOSUM 45, 62 and 80 matrices and PAM 20, 60, 70 and 120
matrices for protein sequences,
and identity, blast and transition/transversion matrices for
DNA/RNA sequences.
- Identity Matrix of DNA/RNA Sequences:
A C G T
A 10 0 0 0
C 0 10 0 0
G 0 0 10 0
T 0 0 0 10
- Blast Matrix of DNA/RNA Sequences:
A C G T
A 5 -4 -4 -4
C -4 5 -4 -4
G -4 -4 5 -4
T -4 -4 -4 5
- Transition/Transversion Matrix of DNA/RNA Sequences:
A C G T
A 1 -5 -1 -5
C -5 1 -5 -1
G -1 -5 1 -5
T -5 -1 -5 1
Method:
For the details of the algorithms for our MuSiC, please refer to this
METHOD page.
Examples:
Some examples of protein/DNA/RNA sequences for testing MuSiC are collected
at this EXAMPLE page.
Contact information:
-
To whom correspondence should be addressed:
Dr. Chin Lung Lu (Email: cllu@mail.nctu.edu.tw)
and Prof. Chuan Yi Tang (Email: cytang@cs.nthu.edu.tw)
-
The MuSiC Software is developed by Yen Pin Huang (Email: icefx.bi91g@nctu.edu.tw)
Acknowledgements:
This work was supported in part by National Science Council
of Republic of China under grants NSC92-2213-E-009-089,
NSC92-3112-B-009-002 and NSC93-2321-B-007-001.
References:
-
Y.T. Tsai, Y.P. Huang, C.T. Yu and C.L. Lu (2004),
MuSiC: A Tool for Multiple Sequence Alignment with Constraints,
Bioinformatics, Vol. 20, pp. 2309-2311.
- C.Y. Tang, C.L. Lu, M.D.T. Chang, Y.T. Tsai, Y.J. Sun, K.M. Chao,
J.M. Chang, Y.H. Chiou, C.M. Wu, H.T. Chang and W.I. Chou (2003),
Constrained Multiple Sequence Alignment Tool Development and Its
Application to RNase Family Alignment,
Journal of Bioinformatics and
Computational Biology,
Vol. 1, pp. 267-287.