Computational tools for the alignment and superposition of macromolecular structures are essential instruments in structural biology. TopMatch-web provides an easy-to-use interface to a suite of techniques for protein structure alignments called TopMatch (Sippl & Wiederstein, 2008; Sippl & Wiederstein, 2012). Given a pair of structures, TopMatch calculates a list of alignments ordered by structural similarity. The best alignments are reported in a table. The corresponding superpositions can be explored in a 3D molecule viewer which highlights the structurally equivalent parts of the proteins. The sequence alignments resulting from the structure comparison are provided on-line and in PDF format. Coordinates of the input structures after superposition are available for download.

Input

TopMatch-web requires the atomic coordinates of the two structures that are to be compared. We term the first structure query and the second structure target. Users can supply coordinates either by uploading files in mmCIF format or by entering strings that refer to protein and/or nucleic acid structures available from the latest version of PDB. The syntax of this string allows quick access to PDB structures and is defined as follows:

<4-letter code>
Refers to the entirety of all protein chains contained in a PDB file.
    Examples:
  • 1aqk
<4-letter code>_<chain ID>(_<chain ID>...)
Refers to one or more protein/nucleic acid chain(s) of a PDB file, specified by its 4-letter code and one ore more chain identifiers (author-provided).
    Examples:
  • 1aqk_L
  • 1aqk_L_H
<4-letter code>,<chain ID>(,<chain ID>...)
Refers to one or more protein/nucleic acid chain(s) of a PDB file, specified by its 4-letter code and one ore more chain identifiers (PDB-provided).
    Examples:
  • 1aqk,A
  • 1aqk,A,B
<4-letter code>@<ID of biological assembly>
Refers to a specific macromolecular assembly thought to be a functional form of the protein (also sometimes referred to as the biological unit; usually referred to by a number).
    Examples:
  • 3f1w@2
<4-letter code>;<model number>
<4-letter code>_<chain ID>;<model number>
<4-letter code>,<chain ID>;<model number>
For structures determined by NMR a model number can be specified. If omitted, TopMatch-web uses the first model found in the given PDB entries.
    Examples:
  • 1kdx_A;10
  • 1kdx;10

Sample input is also available from the TopMatch-web start page.

The following parameters affect the results calculated by TopMatch:

Multiple solutions
By default, TopMatch reports the match with the highest structure similarity. However, in general there is more than one interesting match. If "Multiple solutions" is switched on, TopMatch lists all relevant matches where the corresponding alignments are ranked by similarity.

Output

Superposition

TopMatch-web visualizes the superimposed input structures in 3D using the interactive molecule viewer NGL (Figure 1a). The query structure is shown in blue, the target structure is shown in green. Pairs of residues that are structurally equivalent are colored orange (query) or red (target).
Transformed atomic coordinates for query and target after superposition can be downloaded in mmCIF format or as a PyMOL script, which also includes commands for coloring the equivalent parts of the structures. Furthermore, the transformation matrix used to transform the target coordinates is available in plain text format (Figure 1b).
The representation mode of query and target can be controlled via the "Options" menu (Figure 1c). From there, query and target can also be selectively hidden to facilitate the analysis. Further options include the display of ligands, the background color, the opacity of unmatched parts, and the size of the NGL widget.

Aligned sequences

The sequence alignment that results from a selected structure alignment is shown in a box below the NGL widget (Figure 1d). By default, the alignment is displayed schematically; it can be viewed in more detail by toggling the "Show alignment details" button. In accordance with the 3D visualization, the query sequence is colored blue, the target sequence is colored green and pairs of structurally equivalent residues are colored orange (query) or red (target). We emphasize that in structure based sequence alignments of this kind only the orange/red parts show meaningful information and that the positions of the blue and green residues is of no relevance. Two rulers showing the PDB chain IDs and residue numbers found in the input structures facilitate navigation.
The alignment is also typeset using the TeXshade package (Beitz, 2000) and provided as PDF file and in FASTA format (Figure 1b). In the PDF file, structurally equivalent residues are marked by bars in orange (query) or red (target) and are shown in bold if they are the same amino acid/nucleotide. The FASTA file is plain text; structurally equivalent residues are shown as CAPITAL letters, and blanks indicate permuted blocks.

Table of alternative alignments

TopMatch reports a ranked list of alignments (Figure 1e). The alignments are characterized by a small set of parameters:

LEN
Number of residue pairs that are structurally equivalent
(= alignment length).
QC%
Query cover based on similarity score, expressed in percent
(= 100 x SCORE/Qn, where Qn is the length of the query sequence).
TC%
Target cover based on similarity score, expressed in percent
(= 100 x SCORE/Tn, where Tn is the length of the target sequence).
SCORE
Measure of structural similarity based on Gaussian functions (see Sippl & Wiederstein (2012)). If the structurally equivalent parts in query and target match perfectly, S is equal to L. With increasing spatial deviation of the aligned residues, S approaches 0.
RMS
Root-mean-square error of superposition in Ångström, calculated using all structurally equivalent C-alpha (proteins) or P (nucleic acids) atoms.
SI%
Sequence identity of query and target in the equivalent regions, expressed in percent.

A particular alignment can be selected by clicking on the respective line in the alignment table. The superposition corresponding to the selected alignment is then shown in the NGL widget.

TopMatch screenshot
Figure1: Screenshot of TopMatch-web.

System Requirements

TopMatch-web should run on most operating systems with all common browsers. We have successfully used TopMatch-web in the following environments:

OS Version Chrome Firefox Safari Microsoft Edge Opera
Linux Debian 8 57.0 60.4 n/a n/a not tested
Linux Ubuntu 18.04 not tested 64.0 n/a n/a not tested
MacOS 10.12.6 not tested 62.0 12.0 n/a not tested
MacOS 10.9.5 65.0 not tested 9.1.3 (not working) n/a 49.0
Windows 10 71.0 64.0 n/a 42.17134 not tested

If you experience any problems using TopMatch, please feel free to contact us.

References

Sippl & Wiederstein (2012) Detection of correlations in protein structures and molecular complexes. Structure 20, 718-728. [view]

Slater, Castellanos, Sippl & Melo (2012) Towards the development of standardized methods for comparison, ranking and evaluation of structure alignments. Bioinformatics 29, 47-53. [view]

Sippl & Wiederstein (2008) A note on difficult structure alignment problems. Bioinformatics 24, 426-427. [view]

Sippl (2008) On distance and similarity in fold space. Bioinformatics 24, 872-873. [view]

Rose, Bradley, Valasatava, Duarte, Prlić & Rose (2018) NGL viewer: web-based molecular graphics for large complexes. Bioinformatics 34, 3755-3758. [view]

Beitz (2000) TEXshade: shading and labeling of multiple sequence alignments using LATEX2 epsilon. Bioinformatics 16, 135-139. [view]