Computational tools for the alignment and superposition of protein structures are essential instruments in structural biology. TopMatch-web provides an easy-to-use interface to a suite of techniques for protein structure alignments called TopMatch (Sippl & Wiederstein, 2008; Sippl & Wiederstein, 2012). Given a pair of protein structures, TopMatch calculates a list of alignments ordered by structural similarity. The five best alignments are reported in a table. The corresponding superpositions can be explored in a 3D molecule viewer which highlights the structurally equivalent parts of the proteins. The sequence alignments resulting from the structure comparison are provided on-line and in PDF format. Coordinates of the input structures after superposition are available for download.
System Requirements
TopMatch requires Java to run the Jmol plugin. Due to this platform independent setup, TopMatch should run on most operating systems as well as on all common browsers. If you experience any trouble using TopMatch, please feel free to contact us.
Input
TopMatch-web requires the atomic coordinates of the two protein structures that are to be compared. We term the first structure query and the second structure target. Users can supply coordinates either by uploading files in PDB format or by entering strings that refer to protein structures available from the latest version of PDB, SCOP (release 1.75), CATH (version 3.2.0) or COPS (concurrent with latest version of PDB). The syntax of this string allows quick access to major protein structure repositories and is defined as follows:
- <4-letter code>(,<chain IDs>)
-
Refers to one or more protein chains of a PDB file, specified by
its 4-letter code and one ore more chain identifiers.
The tilde symbol (~) is used to denote chains with a blank chain
identifier.
- 1aqk,L
- 1aqk,LH
- 3fgf,~
Examples: - <4-letter code>,<chain IDs>,<model number>
-
For structures determined by NMR a model number can be specified
in addition. If omitted, TopMatch-web uses the first model
found in the given PDB chain(s).
- 1kdx,A,10
Examples: - <4-letter code>
-
Refers to the entirety of all protein chains contained in a
PDB file.
- 1igt
Examples: - <4-letter code>,,<model number>
-
As for single chains, specific NMR models can be specified by
their model number.
- 1kdx,,5
Examples: - <4-letter code>@<number of biological assembly>
-
Refers to a specific macromolecular assembly thought to be a
functional form of the protein (also sometimes referred to as
the
biological unit).
- 3f1w@2
Examples: - <4-letter code>,<chain fragments>(,<model number>)
-
Fragments of a chain can be specified by the PDB residue numbers
of its start and end, separated by a colon (':') and put in
parenthesis. It is valid to specify structures consisting of
more than one fragment in this way. An optional model number may
be added for NMR structures.
- 2hhb,A(20:70)
- 1kdx,A(587:640),10
- 1aqk,H(1:40)L(35:80)
- 1bxz,A(1:139)A(314:352)
(corresponds to SCOP domain d1bxza1, see below)
Examples: - d<4-letter code><chain ID><domain number>
-
(=SCOP SID). Refers to a domain of a PDB structure as defined in
the SCOP classification.
- d1sdwa2
- d1sdf__
Examples: - e<4-letter code><chain ID><2-digit domain number>
-
(=CATH Domain ID). Refers to a domain of a PDB structure as
defined in the CATH classification.
- e1sdwA01
- e1sdf000
Examples:
The following options affect the results calculated by TopMatch:
- Permutations
- If switched on, allow structure alignments that contain permutations of the target sequence relative to the query.
- Composite
- If switched on, allow composite structure alignments. Composite alignments are built by combining individual basic alignments that cover distinct regions of the molecules. For details, see Sippl & Wiederstein (2012).
- Match | Fuzzy | Precise
- The 'Match' button triggers a TopMatch calculation with standard parameters. Use the 'Fuzzy' or 'Precise' button to allow for larger or smaller spatial deviations between query and target in the construction of alignments, respectively.
Output
Superposition
TopMatch-web visualizes the 3D structures of the superimposed input
proteins using the molecule viewer Jmol (Figure 1a). The query structure
is shown in blue,
the target structure is shown in
green.
Pairs of residues that are structurally equivalent are colored
orange (query) or
red (target).
In case of structures that consist only of C-alpha atoms the
cartoon representation will sometimes cause an
apparently empty widget. This is resolved by switching to a different
Display Mode.
Transformed atomic coordinates for query and target after
superposition can be downloaded in PDB format or as a PyMOL script,
which also includes commands for coloring the equivalent parts of the
structures. Furthermore, the transformation matrix used to
transform the target coordinates is available in plain text format
(Figure 1b).
The widget can be enlarged by clicking on the
magnifying glass to examine the structure and its
superposition in more detail. Query and target can
selectively be hidden to facilitate the
analysis. Furthermore, the Jmol widget is capable of
displaying ligands and highlighting sequence
identities in the structure alignment.
Aligned sequences
The sequence alignment that results from a selected
structure alignment is shown in a box below the Jmol
applet (Figure 1c). In
accordance with the 3D visualization, the query
sequence is colored blue, the
target sequence is colored green and
pairs of structurally equivalent residues are colored
orange
(query) or red
(target). We emphasize that in structure based
sequence alignments of this kind only the orange/red
parts show meaningful information and that the
positions of the blue and green residues are of no
relevance. Two rulers showing the PDB residue numbers
found in the input structures facilitate navigation.
The alignment is also typeset using the TeXshade package and provided
as PDF file and in FASTA format (Figure 1b). In the PDB file, structurally equivalent residues are marked by
bars in orange (query) or
red (target) and are shown
in bold if they have the same amino acid type. The FASTA file is plain text; structurally equivalent residues are shown as CAPITAL letters, and blanks indicate permuted blocks.
Table of alternative alignments
TopMatch reports a ranked list of alignments (Figure 1d). The alignments are characterized by a small set of parameters:
- T
- Type of alignment. Basic structure alignments are denoted by 'b', composite alignments are denoted by 'c', and sequence alignments are denoted by 'q'.
- R
- Rank of alignment.
- S
- Measure of structural similarity based on Gaussian functions (see Sippl & Wiederstein (2012)). If the structurally equivalent parts in query and target match perfectly, S is equal to L. With increasing spatial deviation of the aligned residues, S approaches 0.
- Sq
-
Query cover based on similarity score, expressed in percent
(= 100 x S/Qn, where Qn is the length of the query sequence). - St
-
Target cover based on similarity score, expressed in percent
(= 100 x S/Tn, where Tn is the length of the target sequence). - L
-
Number of residue pairs that are structurally equivalent
(= alignment length). - Lq
-
Query cover based on alignment length, expressed in percent
(= 100 x L/Qn, where Qn is the length of the query sequence). - Lt
-
Target cover based on alignment length, expressed in percent
(= 100 x L/Tn, where Tn is the length of the target sequence). - Sr
- Typical distance error. For details on the construction of this per-residue measure of structural similarity, see Sippl & Wiederstein (2012).
- Er
- Root-mean-square error of superposition in Ångström, calculated using all structurally equivalent C-alpha atoms.
- Is
- Sequence identity of query and target in the equivalent regions, expressed in percent.
- P
- Number of permutations in the alignment.
A particular alignment can be selected by clicking on its rank in the alignment table. The superposition corresponding to the selected alignment is then shown in the Jmol applet window.
References
Sippl & Wiederstein (2012). Detection of correlations in protein structures and molecular complexes. Structure 20, 718-728. [view]
Slater, Castellanos, Sippl & Melo (2012). Towards the development of standardized methods for comparison, ranking and evaluation of structure alignments. Bioinformatics 29, 47-53. [view]
Sippl & Wiederstein (2008). A note on difficult structure alignment problems. Bioinformatics 24, 426-427. [view]
Sippl (2008) On distance and similarity in fold space. Bioinformatics 24, 872-873. [view]
Jmol: an open-source Java viewer for chemical structures in 3D. [view]
Beitz (2000) TEXshade: shading and labeling of multiple sequence alignments using LATEX2 epsilon. Bioinformatics 16, 135-139. [view]