MuSiC Help Page


Introduction:

MuSiC is a web server to perform the constrained alignment of a set of sequences, such that the user-specified residues/nucleotides are aligned with each other. The input of the MuSiC system consists of a set of protein/DNA/RNA sequences and a set of user-specified constraints, each with a fragment of residue/nucleotide that (approximately) appears in all input sequences. The output of MuSiC is a constrained multiple sequence alignment in which the fragments of the input sequences whose residues/nucleotides exhibit a given degree of similarity to a constraint are aligned together. The current MuSiC system is implemented in Java language and can be accessed via a simple web interface.

Usage of MuSiC:

Input:

To run MuSiC, users first input their protein/DNA/RNA sequences in the FASTA format in the top blank field (1). They then must select a suitable scoring matrix from a list of predefined matrices (2). The proposed MuSiC system penalizes the gaps using the affine gap penalty function so users must also key in two real values for the gap open penalty (3) and the gap extension penalty (4). Alternatively, the user may simply choose "Protein" or "DNA/RNA" (5) to adopt the default values provided by the system, where "Protein" and "DNA/RNA" indicate that the sequences considered by the user are protein and DNA/RNA sequences, respectively. Finally, in the "Constraints" field (6), users can input constrained sequences, delimited by non-alphabetic characters, such as commas, slashes, verticals and pluses; they may also select the "Approximate" option and input a real number 0 <= R < 1 in the "Ratio" field (7), which implies that no more than (l(Pi) * R) mismatches exist between the constrained sequence and the aligned fragments of input sequences, where l(Pi) denotes the length of a constrained sequence Pi. Notably, the "Constraints" field need not be filled. If this field is left blank, then the MuSiC system will output an ordinary MSA without constraints.
Example of MuSiC

Output:

In the resulting multiple sequence alignment, the constrained columns are colored in black and their corresponding constraints are also displayed beneath them.
Example of MuSiC

The colors of the characters in the non-constrained columns of the resulting multiple sequence alignment are followed the rules of Clustal X, which are defined as follows. In the case of protein sequences, the default colors are as follows:

	Color			Residue Code
        -----------------------------------------
	ORANGE			GPST
	RED			HKR
	BLUE			FWY
	GREEN			ILMV
In the case of DNA/RNA sequences, the default colors are as follows:
	Color			Base Code
        --------------------------------------
	ORANGE			A
	RED			C
	BLUE			T
	GREEN			G

Scoring Matrices:

The currently predefined scoring matrices of MuSiC system consist of BLOSUM 45, 62 and 80 matrices and PAM 20, 60, 70 and 120 matrices for protein sequences, and identity, blast and transition/transversion matrices for DNA/RNA sequences.

Method:

For the details of the algorithms for our MuSiC, please refer to this METHOD page.

Examples:

Some examples of protein/DNA/RNA sequences for testing MuSiC are collected at this EXAMPLE page.

Contact information:

Acknowledgements:

This work was supported in part by National Science Council of Republic of China under grants NSC92-2213-E-009-089, NSC92-3112-B-009-002 and NSC93-2321-B-007-001.

References:

  1. Y.T. Tsai, Y.P. Huang, C.T. Yu and C.L. Lu (2004), MuSiC: A Tool for Multiple Sequence Alignment with Constraints, Bioinformatics, Vol. 20, pp. 2309-2311.
  2. C.Y. Tang, C.L. Lu, M.D.T. Chang, Y.T. Tsai, Y.J. Sun, K.M. Chao, J.M. Chang, Y.H. Chiou, C.M. Wu, H.T. Chang and W.I. Chou (2003), Constrained Multiple Sequence Alignment Tool Development and Its Application to RNase Family Alignment, Journal of Bioinformatics and Computational Biology, Vol. 1, pp. 267-287.