Review Article Volume 3 Issue 6
1Department of Biophysics, Tarbiat Modares University (TMU), Iran
2Bioinformatics and Computational Omics Lab (BioCOOL), Trabiat Modares University (TMU), Iran
Correspondence: Seyyed Shahriar Arab, Department of Biophysics, School of Biological Sciences, Tarbiat Modares University (TMU), Tehran, Iran
Received: March 12, 2016 | Published: July 5, 2016
Citation: Mahmoudi M, Arab AA, Zahiri J, Parandian Y (2016) An Overview of the Protein Thermostability Prediction: Databases and Tools. J Nanomed Res 3(6): 00072. DOI: 10.15406/jnmr.2016.03.00072
Thermophilic proteins are characterized as high thermal stability proteins while mesophilic proteins are stable at lower temperatures. These types of proteins have numerous applications regarding protein engineering, drug design and industrial processes. Studies showed that thermal stability is strongly related to structural and sequential properties in thermophilic proteins. Some computational studies were being taken to identify the mentioned properties in heat resistant proteins. This paper reviews the studies of protein thermostability prediction and gives an introduction to the thermal stability related tools and databases.
Keywords:Rotein thermostability, Thermophilic proteins, Mesophilic proteins, Databases, computational methods, Bioinformatics
Environmental temperature plays an important role in the cell life.1 There are four classes of organism in relation to their optimal growth temperature namely hyperthermophile (>80◦C), thermophile (45-80◦C), mesophile (20-45◦C) and psychrophile (<20 ◦C).2 Thermal stability is defined as the ability of material to resist changes in physical structure or chemical irreversibility, or spatial structure stability of polypeptide chains at high temperatures.3 Studies showed that thermal stability of thermophilic proteins is related to a series of protein sequential and structural properties.4 A small number of these mentioned properties are going to be introduced in this paper. Also, the amino acid compositions difference had been studied in mesophilic and thermophilic proteins.3,5-7 For instance, Zhang and Gromiha research shows that Lys, Arg, Glu and Pro were higher and Ser, Met, Asp and Thr were lower in number of thermophilic than the of mesophilic proteins number .6,8 (Figure 1). Protein secondary structure stability like alpha-helix is considered as a necessary factor for thermal stability.6 Studies suggested that thermal-stability is increased by certain characteristics in proteins. These characteristics are: increased number of hydrogen bonds.7 salt bridges, ion pairs .9 aromatic clusters.8 sidechain-sidechain interactions, electrostatic interactions of charged residues .9 and hydrophobic interactions.5
Protein’s Thermal Stability Prediction Methods
Protein’s thermal stability can be predicted based on sequence or structure. Both mentioned methods and their corresponding advantages and limitations have been discussed here in further detail. Table 1 demonstrates an overview of the thermal stability prediction methods.
Sequence/Structure Feature |
Algorithm |
Reference |
Amino acid sequence |
Support vector machine |
|
Primary structure |
LogitBoost |
|
Amino acid sequence and residues and dipeptide composition |
Neural network |
|
Primary, secondary and tertiary structure information |
Decision tree |
|
Amino acid distribution and dipeptide composation |
Support vector machine |
|
Amino acid composition-based similarity distance |
KNN-ID |
|
Dipeptide composation |
Statistical Methods |
|
Amino acid sequence |
Genetic Algorithm |
|
Thermodynamic parameters |
Statistical Potentials |
Table 1 An overview of protein thermostability prediction studies.
Sequence based prediction
This method utilizes sequence information of proteins; for instance, distribution of amino acid and di-peptide composition for discrimination of thermophilic and mesophilic proteins. Studies revealed the differences between amino acid and di-peptide composition in thermophilic and mesophilic proteins. For example, the frequency of Lys, Arg, Glu and Pro was higher in thermophilic than mesophilic protein.8,10. These studies also show that the occurrences of EE, KK, RR, PP, KI, VV, VE, KE, and VK were higher in thermophilic proteins while QQ, AA, EQ, LL, NN, QT had lower occurrences.6 In addition, the frequency of charged, hydrophobic and aromatic amino acids in thermophilic protein is higher than mesophilic ones.3 Moreover, the correlation between protein amino acid composition and its biological function has been proven.1 So, the protein sequence analysis provides valuable information to predict protein thermostability; particularly whenever the structural information of proteins is not available.
Structure based prediction
The studies of protein thermostability prediction are based on protein structures utilized protein secondary and tertiary information for discrimination of thermophilic and mesophilic proteins. Important features considered in this studies include amount of secondary structure, ion pairs, hydrogen bonds, disulfide bonds and accessible surface area.11 Although the thermal stability is directly related to the protein structure stability .11 Regarding the fact that structural and sequential features affect the thermal stability, applying the both mentioned features at the same time leads to a more accurate, precise prediction. The protein structural information may not be always available; This restrains structure based protein thermostability prediction.
Prediction algorithms based on machine learning methods
The following section introduces a few machine learning algorithms. The selected algorithm is going to distinguish the thermophile from mesophile proteins.
Assessing a prediction tool is a critical task. Table 2 describes commonly used measures for performance prediction assessment: accuracy, sensitivity, specificity, strength, MCC, precision, F-measure and area under the ROC curve (AUC). These measures based on the following four basic parameters:
Expression |
A brief description |
|
percent of correct prediction |
|
percent of correctly predicted positive |
|
percent of correctly predicted negative |
|
Positive Predictive Value |
|
The harmonic mean of sensitivity |
Table 2 Commonly used measures for performance assessment in protein thermostability prediction.
To build a model capable of predicting the proteins thermal stability; at first, a dataset is created using the related databases. This dataset contains information about the structure and sequence of thermophilic and mesophilic proteins. Table 3 describes a few databases that have been used in studies of protein’s thermal stability prediction. According to Table 3, PGT and ProTherm DBs are specifically used to predict the thermal stability. PDB database is used to extract structural information while Uniport gives the sequential information of thermophilic and mesophilic proteins.
Data bases |
Note |
Ref. Num |
|
General Databases |
UniProt |
The Universal Protein Resource (UniProt) provides a stable, comprehensive, freely accessible, central resource on protein sequences and functional annotation. This DB is used to extract the sequential information of thermophilic and mesophilic proteins. |
|
PDB |
The Protein Data Bank contains information of the 3D structures of large biological molecules, including proteins and nucleic acids. This DB is used to extract structural information of thermophilic and mesophilic proteins. |
||
Specific Databases |
Pro Therm |
ProTherm is a thermodynamic database that contains experimentally determined thermodynamic parameters of protein stability. This DB is specifically used to predict the thermal stability. |
|
PGT |
PGT contains Prokaryotic Growth Temperature database (PGTdb). This DB is specifically used to predict the thermal stability. |
Table 3: List of databases in protein thermostability prediction.
Due to the recent pervasive use of thermostable proteins and enzymes in industry, protein engineering and other theoretical/experimental studies play a significant role in identification of protein thermal stability. Regarding the high expense rate of laboratory procedures, the employment of theoretical methods for predicting the thermal stability with high accuracy could be so helpful. So far, most computational thermophilic and mesophilic protein identification studies have been solely based on the protein sequence. Regarding the fact that both structural and sequential features affect the thermal stability, applying the both mentioned features at the same time leads to a more accurate, precise prediction.
None.
None.
©2016 Mahmoudi, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.