ConSurf Logo


ConSurf-DB is a database of pre-calculated ConSurf conservation profiles covering, in essence, all protein structures in the PDB [1]. We present here a new release of the database which now covers 80,479 protein structures. Table 1 provides ConSurf-DB statistics. A flowchart of the updated ConSurf-DB methodology is shown in Figure 1. A four-step procedure was used to construct ConSurf-DB: 1. The first step involved scanning of the PDB repository to generate a protein sequence list according to the PDB entry and chain ID. Non-redundant structures were extracted from the list using the PISCES webserver [2]; 2. A unique procedure was used for building an MSA for each protein, which balanced the need for sequence diversity while avoiding the inclusion of non-homologues as much as possible. For that we relied as much as possible on the SWISSPROT database [3], a small curated database of annotated proteins, and referred to the larger and noisier Uniref90 database [4] only when necessary. Initially a CS-BLAST [5] search against the SWISSPROT database was conducted with the goal of detecting at least 50 unique hits. In cases of failure to meet the threshold, we searched the Uniref90 database using CS-BLAST, and CSI-BLAST with 3 iterations. The list of collected homologues was subsequently filtered by coverage (minimum 80%), and sequence identity (between 60-95%). The remaining homologues were filtered again using CD-HIT with 90% sequence identity clustering threshold[6]. The decision on whether to proceed with the search for homologues or abort and move to the next step was based on the number of sequences after filtration. An MSA of the homologues was constructed using MAFFT [7]; 3. Conservation calculation: the MSA was used to build a phylogenetic tree using the neighbor-joining algorithm [8] as implemented in the Rate4Site [9] program. Position-specific conservation scores were computed using the Bayesian algorithm [10] and JTT evolutionary substitution model [11]; 4. Results formatting: continuous conservation scores were divided into a discrete scale of nine grades for visualization, from the most variable positions (grade 1) colored turquoise, through intermediately conserved positions (grade 5) colored white, to the most conserved positions (grade 9) colored maroon. Finally, the conservation scores were projected on the protein structure and the MSA for visualization.

Figure 1. A flowchart of the process used to construct ConSurf-DB. A four-step procedure was used: scanning the PDB, building MSA, calculating the conservation scores and formatting the results.

Table 1. Build statistics for the updated version of ConSurf-DB (January 2013)

Total number of non-redundant chains processed 

56,849 chains


Total number of chains located within 80,479 protein structures

209,072 chains


First step - CS-BLAST on the SWISSPROT database generated

19,834 MSAs 


Second step - CS-BLAST on the UniRef90 database generated

28,536 MSAs 


Third step - CSI-BLAST (3 iterations) on the UniRef90 database generated

2,418 MSAs


Number of chains left with less than 50 unique homologues (no calculations)

3,721 chains


The median number of unique homologs collected



Minimum and maximum number of unique homologs was set to

50 and 300



ConSurf-DB provides the biologist with a pre-calculated conservation profile of proteins of interest, allowing instantaneous initial evaluation of the results. An advanced homologues selection process was used, designed to improve over ordinary ConSurf runs with default parameters. This makes ConSurf-DB a preferred tool for initial investigation of proteins. Additionally, ConSurf-DB is linked to other databases and interactive tools. One example is Proteopedia [12], where the ConSurf-DB colored structure can be visualized interactively in Jmol on the same page with the structure publication title and abstract, identification of ligands and non-standard residues, and other information. Other examples are the PDBsum [13] and MarkUs [14], a server to navigate sequence-structure-function space. Please note: convenient as ConSurf-DB is, it is important to remember that it should be possible to further improve the results for a particular protein of interest with the use of tailor-made procedures for homologues detection, manual selection of homologues (made easy in the ConSurf web-server), as well as other means to reconstruct the alignment or phylogeny.


[1]    O. Goldenberg, E. Erez, G. Nimrod, N. Ben-Tal, Nucleic acids research 2009, 37, D323-327.

[2]    G. Wang, R. L. Dunbrack, Jr., Bioinformatics 2003, 19, 1589-1591.

[3]    U. Consortium, Nucleic acids research 2012, 40, D71-D75.

[4]    tics 2007, 23, 1282-1288.

[5]    C. Angermuller, A. Biegert, J. Soding, Bioinformatics 2012.

[6]    Y. Huang, B. Niu, Y. Gao, L. Fu, W. Li, Bioinformatics 2010, 26, 680-682.

[7]    K. Katoh, H. Toh, Bioinformatics 2010, 26, 1899-1900.

[8]    N. Saitou, M. Nei, Molecular biology and evolution 1987, 4, 406-425.

[9]    T. Pupko, R. E. Bell, I. Mayrose, F. Glaser, N. Ben-Tal, Bioinformatics 2002, 18 Suppl 1, S71-77.

[10]    B. Western, Sociol Method Res 2003, 32, 288-291.

[11]    D. T. Jones, W. R. Taylor, J. M. Thornton, Computer applications in the biosciences : CABIOS 1992, 8, 275-282.

[12]    E. Hodis, J. Prilusky, E. Martz, I. Silman, J. Moult, J. L. Sussman, Genome biology 2008, 9, R121.

[13]    R. A. Laskowski, Nucleic acids research 2009, 37, D355-359.

[14]    M. Fischer, Q. C. Zhang, F. Dey, B. Y. Chen, B. Honig, D. Petrey, Nucleic acids research 2011, 39, W357-361.