ConSurf-DB is a database of pre-calculated ConSurf conservation profiles covering, in essence, all protein structures in the PDB . We present here a new release of the database which now covers 80,479 protein structures. Table 1 provides ConSurf-DB statistics. A ﬂowchart of the updated ConSurf-DB methodology is shown in Figure 1. A four-step procedure was used to construct ConSurf-DB: 1. The first step involved scanning of the PDB repository to generate a protein sequence list according to the PDB entry and chain ID. Non-redundant structures were extracted from the list using the PISCES webserver ; 2. A unique procedure was used for building an MSA for each protein, which balanced the need for sequence diversity while avoiding the inclusion of non-homologues as much as possible. For that we relied as much as possible on the SWISSPROT database , a small curated database of annotated proteins, and referred to the larger and noisier Uniref90 database  only when necessary. Initially a CS-BLAST  search against the SWISSPROT database was conducted with the goal of detecting at least 50 unique hits. In cases of failure to meet the threshold, we searched the Uniref90 database using CS-BLAST, and CSI-BLAST with 3 iterations. The list of collected homologues was subsequently filtered by coverage (minimum 80%), and sequence identity (between 60-95%). The remaining homologues were filtered again using CD-HIT with 90% sequence identity clustering threshold. The decision on whether to proceed with the search for homologues or abort and move to the next step was based on the number of sequences after filtration. An MSA of the homologues was constructed using MAFFT ; 3. Conservation calculation: the MSA was used to build a phylogenetic tree using the neighbor-joining algorithm  as implemented in the Rate4Site  program. Position-specific conservation scores were computed using the Bayesian algorithm  and JTT evolutionary substitution model ; 4. Results formatting: continuous conservation scores were divided into a discrete scale of nine grades for visualization, from the most variable positions (grade 1) colored turquoise, through intermediately conserved positions (grade 5) colored white, to the most conserved positions (grade 9) colored maroon. Finally, the conservation scores were projected on the protein structure and the MSA for visualization.
Figure 1. A ﬂowchart of the process used to construct ConSurf-DB. A four-step procedure was used: scanning the PDB, building MSA, calculating the conservation scores and formatting the results.
ConSurf-DB provides the biologist with a pre-calculated conservation profile of proteins of interest, allowing instantaneous initial evaluation of the results. An advanced homologues selection process was used, designed to improve over ordinary ConSurf runs with default parameters. This makes ConSurf-DB a preferred tool for initial investigation of proteins. Additionally, ConSurf-DB is linked to other databases and interactive tools. One example is Proteopedia , where the ConSurf-DB colored structure can be visualized interactively in Jmol on the same page with the structure publication title and abstract, identification of ligands and non-standard residues, and other information. Other examples are the PDBsum  and MarkUs , a server to navigate sequence-structure-function space. Please note: convenient as ConSurf-DB is, it is important to remember that it should be possible to further improve the results for a particular protein of interest with the use of tailor-made procedures for homologues detection, manual selection of homologues (made easy in the ConSurf web-server), as well as other means to reconstruct the alignment or phylogeny.