Research Article Volume 7 Issue 1
1Institute of Bioinformatics and Biotechnology, S. P. Pune University, India
2Open Vision, India
Correspondence: Sohan P Modak, Open Vision, 759/75, Deccan Gymkhana, Pune 411004, India
Received: February 13, 2018 | Published: February 23, 2018
Citation: Kelkar N, Modak SP, The internal oligopeptide sequences missing in crystals are isordered domains. MOJ Proteomics Bioinform. 2018;7(1): 00215. DOI: 10.15406/mojpb.2018.07.00215
Polypeptide sequences in pdb format are invariably shorter than those in FASTA format. The missing residues are mostly internal oligopeptide strings and few C & N terminal residues. We have compared the panorama of the secondary structure domains generated from both formats by folding in silico and find that the missing oligopeptides are mostly from the intrinsically distorted domains.
Keywords: protein crystals, fasta format, pdb fomat, protein secondary structure. disordered domain. α helix, β sheet, internal missing oligopeptides
Prior to their maturation as a biological structure or function, nascent polypeptides fold to form three dimensional structures composed of α helices, β sheets and disordered regions. The amino acid sequence of the processed polypeptide is stored in FASTA format (www.rcsb.org) and it is almost always longer than that in the crystal structure, retrievable in PyMol stored in pdb format, wherein the absence of residues has been noted at the C-terminal, N-terminal and at intra-polypeptide locations of crystals. Indeed, a large number of protein crystals in the data base exhibit internal missing string.1 Crystallographers generally consider that the missing residues are due to low electron density undetectable in low resolution crystallography. Since some of the gaps at the N and C termini can be attributed to post- translational processing, the presence of missing internal oligopeptides may lead to misinterpretation of the secondary structure domains in the immediate vicinity of the gaps as well as in the flanking segments. While studying the phylogeny of proteins2 we considered the possibility that the extent of evolutionary conservation of residues defining individual secondary structure domains may be one of the determinants. As we came across the cases of internal missing intra-molecular residues here we analyze their structure and significance.
Amino acid sequences of 9 proteins were downloaded from RCSB pdb in FASTA and crystal format.3 (www.rscb.org) These are, (1) SAICAR synthase from Saccharomyces cerevisiae, strain ATCC 204508/S288c (PDB Id : 1A48),4 (2) SAICAR synthase complexed with ADP,AICAR, and succinate from the same strain as above (www.rscb.org), (3) Lipoate-protein ligase A from Streptococcus agalactiae (PDB Id: 2P0L) (www.rscb.org), (4) P450 pyr hydroxylase from Sphingopyxis macrogoltabida (PDB Id : 3RWL),5 (5) Hydroxymethylbilane synthase from Escherichi coli (strain K12) (PDB Id : 2YPN),6 (6) UDP-n-acetylmuramoyl-L-alanine:Dglutamate ligase from Escherichia coli (K12) (PDB Id: 1UAG),7 (7) Glycinamide ribonucleotide synthetase from Escherichi col (strain K12) (PDB Id: 1GSO),8 (8) Folypolyglutamate synthetase from Lactobacillus casei (PDB Id : 1FGS)9 and (9) mitochondrial helicase suv3 from Homo sapiens (PDB Id : 3RC3).10
The amino acid sequences in two formats were aligned and residues missing at the N-terminal, C- terminal and internal regions were detected. Sequences of 9 proteins were folded with JPred 4 (http://www.compbio.dundee.ac.uk/jpred4) and PSSPred.11–15 From the output we designated residues forming secondary structure domains in different shades, namely light gray (α helix), dark gray (β sheet/loop) and medium gray (disordered domain). The sequences derived from the crystals (pdb format) were similarly shaded.
The sequences in both formats of nine proteins are shown in Figure 1. We noticed that, in contrast to the sequence derived from FASTA file, some amino acids were missing at the termini as well as at internal locations of the polypeptide in the crystal-derived sequences. Upon folding these in silico with Jpred4, we find (Figure 1) that each polypeptide gave rise to lawns exhibiting α-helix, β sheet, and disordered domains/random coil (methods). Since the folding pattern with respect to the number and positions of different structural domains was nearly similar with PPSPred, we have restricted this presentation to J Pred4 for proteins no 1-9.
Figure 1 Panorama of secondary structures from mature protein sequence (FASTA) and crystal structure sequence folded with JPred4: The different secondary domains are coloured according to gray scale. The crystal derived sequence is also colour coded according to secondary structures in grey scale and the arrows indicate the missing region. The missing oligopeptide region in crystal derived sequence in highlighted in the FASTA sequence.
Table 1 shows the number of amino acid residues missing in crystal-derived sequences. 2P0L, 3RWL and 3RC3 also exhibit long missing oligopeptides at the termini. Indeed, all crystal-derived sequences contain one or more 3-33 long internal missing oligo-peptides. Table 2 describes the distribution of missing residues in crystals based on their physicochemical properties and number. These were highlighted in sequences from mature protein (FASTA file) in Figure 1. We find that, in 10 cases, more hydrophilic residues are missing in the internal oligopeptide. In the rest 6, the ratio of hydrophobic residues to total number of missing residues is more than 0.5.
Protein |
PDB Id |
Missing residues in crystal derived sequence |
||
|
|
N terminal |
C terminal |
internal strings |
Saicar synthase |
1A48 |
1 |
0 |
7 |
Saicar synthase |
2P0L |
3 |
16 |
3 |
Lipoate-protein ligase A |
3RWL |
15 |
0 |
7 |
P450 pyr hydroxylase |
2YPN |
2 |
0 |
17 |
Hydroxymethylbilane synthase |
2CNQ |
1 |
0 |
3 |
UDP-n-acetylmuramoyl- Lalanine: D-glutamate ligase |
1UAG |
0 |
0 |
5, 4 |
Glycinamide ribonucleotide synthetase |
1GSO |
0 |
0 |
6, 3 |
Folypolyglutamate synthetase |
1FGS |
0 |
0 |
32, 5, 7, 6, 12 |
Mitochondrial helicase suv3 |
3RC3 |
12 |
0 |
14, 11, 33 |
Table 1 Missing oligopeptide in the crystal structure derived sequence
Protein |
PDB Id |
Missing oligopeptide |
Proline residues |
Glycine residues |
Charged residue |
Polar uncharged |
Hydro-phobic |
Total amino acid |
Saicar synthase |
1A48 |
KAEQGEH |
0 |
1 |
4 |
1 |
2 |
7 |
Saicar synthase |
2CNQ |
EQG |
0 |
1 |
1 |
1 |
1 |
3 |
Lipoate-protein ligase A |
2POL |
ERK |
0 |
0 |
3 |
0 |
0 |
3 |
P450 pyr hydroxylase |
3RWL |
QKGGDGG |
0 |
4 |
2 |
1 |
4 |
7 |
Hydorxymethylbilane synthase IUAG |
2YPN |
TROVILDTPLAKGGK |
1 |
3 |
5 |
2 |
10 |
17 |
LTDP-n-acetylinurassioyl- L-alanine: D-glutamate ligase |
IUAG |
GADER |
0 |
1 |
3 |
0 |
2 |
5 |
" |
" |
HQQG |
0 |
1 |
1 |
2 |
1 |
4 |
Glycinamide ribonucleotide synthetase |
1 GSO |
DOL AAG |
0 |
2 |
1 |
0 |
5 |
6 |
FoIypolyglutamate synthetase |
IFGS |
KT |
0 |
0 |
1 |
1 |
0 |
2 |
" |
" |
IGGDT |
0 |
2 |
1 |
1 |
3 |
5 |
" |
" |
HQKLLGH |
0 |
1 |
3 |
1 |
3 |
7 |
" |
" |
ILADKD |
0 |
0 |
3 |
0 |
3 |
7 |
" |
" |
ALPEAGYEALHE |
1 |
1 |
4 |
0 |
7 |
12 |
Mitochondrial Helicase suv 3 |
3 RC 3 |
GPSADGDVGAELTR |
0 |
3 |
4 |
2 |
8 |
14 |
" |
" |
PSINEKGEREL |
1 |
1 |
5 |
5 |
4 |
11 |
Table 2 Distribution of missing residues in crystal structure
Note: only one aromatic residue (tyrosine) was seen in the internal missing oligopeptide string in IFGS.
The secondary structures predicted for internal missing oligopeptide and their flanking tripeptides from both FASTA and PyMol (crystal) formats are shown in Table 3. Surprisingly, 10 out of 16 internal missing oligopeptides form the disordered domains (DD). Among the remaining 6, two disordered domains adjoin terminal residue from β sheet, one adjoins α helix and 3 are from putative helix. In the tripeptides flanking the internal missing stings, we find that, at N-terminal, 10 out of 16 forms IDD, 3 form beta sheets 2 are from α helix and 1 form a junction between beta sheet and random coil. In the C terminal tripeptide, 7 are from disordered domain, 3 form a junction between disorder domain and α helix, 2 β sheet- DD junctions and 2 each from β sheet and α- helix. Thus, clearly, all missing strings are part of original disordered domains
Protein name |
Protein Id |
Missing oligopeptide |
Secondary Seconda structure of residues after folding FASTA sequence in 1Pred4 |
||
missing |
Trip eptide flanking the missing oligopeptide region |
||||
N terminal |
C terminal |
||||
SAICAR synthase |
IA48 |
KAEQGEH |
random coil |
random coil |
random coil |
SAICAR synthase |
2CHQ |
EQG |
random coil |
random coil |
random coil |
Lipoate-protein ligase A |
2POL |
ERK |
random coil |
random coil |
random coil and a helix |
P450 pyrhyd roxylase |
3RWL |
QKGGDGG |
random coil |
random coil |
random coil |
Hydrox-ymethylbilane svnthase |
2 YP |
TRG DVILDTPLAKVGGK |
3 sheet and random coil |
3 sheet |
random coil and a helix |
UDP-n-acetylmuramoyl |
1UAG |
GAD ER |
13 sheet and random coil |
a helix |
0 sheet |
" |
" |
HQQG |
random coil |
13 sheet |
5 sheet |
Glycinamide ribonucleotide |
svnthetase IGSO |
DGLAAG |
random coil |
0 sheet and random c oil |
random coil |
" |
" |
DDE |
random coil |
random coil |
random coil and 3 sheet |
Folypolyglutamate synthetase |
IFGS |
KT |
random coil |
random coil |
random coil and a helix |
" |
" |
IGGDT |
a helix and random coil |
a helix |
random coil |
" |
" |
HQKLLGH |
a helix and random coil |
random coil |
a helix |
„ |
ILADKD |
random coil |
0 sheet |
a helix |
|
ALPEAGYEALHE |
a helix and random coil |
random coil |
random coil |
||
Mitochondrial Helicase suv 3 |
3RC3 |
GPSADGDVGAELTR |
random coil |
random coil |
random coil |
" |
" |
PSINEKGEREL |
3 sheet and random coil |
random coil |
13 sheet and random coil |
Table 3 Predicted secondary structure of the internal missing oligopeptide and the flanking residues
Comparing panoramas of secondary structures derived from the crystal structure to those computed by folding sequences from both, mature proteins and crystals with JPred4, we find (Table 4) that for each type of secondary structure crystals give an underestimate of the number of disordered domains as well as the number of residues therein. Indeed, a combined analysis of 9 proteins reveal the ratio (number. of secondary structure domains: number of. amino acid residues) is comparable for α helices and β sheets, but substantially reduced for disordered domains in crystals than in silico folded mature protein. Similar results were obtained by folding with PSSpred (not shown).
Protein name |
PBD ID |
Source |
Alpha Helix [no. of motifs (no. of ammo acids)] |
Beta sheets [no. of motiffs(no. of ammo acids)] |
Random coils [no. of motiffs(no. of ammo acids)] |
||||||
|
|
|
mature protein |
crystal structure |
mature protein |
crystal structure |
mature protein |
crystal structure |
|||
|
|
|
JPred4 |
derived |
JPred4 |
Jpred |
derived |
JPred4 |
JPred4 |
derived |
JPred4 |
Saicar synthase |
1A48 |
Saccharomyces cerevisiae ATCC 204508 |
6 (82) |
7 (116) |
7(85) |
10 (56) |
15 (106) |
8 (53) |
17(168) |
20 (84) |
16(160) |
Saicar synthase |
2CNQ |
Saccharomycescerevisiae ATCC 204508 |
6 (82) |
6 (127) |
5(82) |
10 (56) |
15 (69) |
10 (57) |
17(168) |
17 (106) |
16(163) |
Lipoate protein ligase A |
2POL |
Streptococcus agalactiae |
10 (105) |
9 (118) |
8(93) |
10 (57) |
11 (55) |
10 (59) |
21(126) |
20 (93) |
9(114) |
P450 pyr ITIroxylase |
3RWL |
Sphingopyxismacrogoltabida |
14 (76) |
14 (234) |
14(177) |
6 (36) |
12 (40) |
6 (34) |
9(214) |
24 (130) |
20 (193) |
Hydroxymethylbilane synthase |
2YPN |
Escherichia coli K12 |
8 (112) |
11 (112) |
8 (113) |
11 (69) |
13 (76) |
11(62) |
20(132) |
21 (106) |
20(119) |
UDP-n-acetylmuramoyl |
IUAG |
Escherichia coli K12 |
15 (161) |
20 (161) |
20(152) |
17 (83) |
20 (89) |
20(88) |
32(193) |
38 (178) |
33(188) |
Glycinamide ribonucleotide synthetase |
1GSO |
Escherichia coli K12 |
11 (127) |
16 (128) |
12(130) |
20 (97) |
16 (99) |
12(99) |
12(99) |
32(207) |
33 (192) |
Folypolyglutamate synthetase |
1FGS |
Lactobacillus casei |
16 (192) |
15 (172) |
13(166) |
14 (67) |
16 (62) |
13(70) |
31(169) |
29 (159) |
27(157) |
Mitochondrial helicase suv3 |
3RC3 |
Homo sapiens |
31 (441) |
26 (444) |
26(366) |
13 (60) |
16 (69) |
14(65) |
43(176) |
42 (164) |
37(246) |
Table 4 Secondary structure from the mature protein sequence and crystal structure derived sequence folded using Jpred4 and the original crystal sequence
We find that only 2 crystals reveal histidine-rich oligopeptides at N or C terminal and none in the internal missing oligopeptide strings (data not shown).
Table 2 lists the relative distribution of Proline, Glycine, charged and hydrophobic residues in the internal missing strings. Thus, there is a high concentration of flexible (Glycine, 20/107) and charged residues (43/107) in these strings, while rigid Proline is of rare occurrence (3/107). Similarly, there is only 1 aromatic residue in the missing strings (not shown).
According to Djinovic-Carrugo & Carrugo,1 most crystallographic data reveal incidence of internal missing strings of oligopeptides. Here we describe in detail 9 such strings and analyses of their position in the overall panorama of secondary structure domains of a polypeptide sequence. To study this aspect we have adopted the strategy of folding in silico sequences for the same protein representing the post-translationally processed polypeptide and that derived from the crystal. The issue here is that when the crystal structure is obtained at low resolution, a number of residues fail to be detected due to low electron density. Therefore, by comparing two amino acid sequences of the same protein, we find the missing residues missing in crystal-derived sequence.
Comparison of the panorama of secondary structure domains revealed that after folding the sequences in silico, allowed us to detect secondary structure domains to which the missing residue belong and we conclude that most are disordered domains. This is further supported by the fact that these are rich in flexible amino acid Glycine and poor in rigid Proline. We find that out of the 16 cases, the proportion of hydrophobic residue is less than 0.5 in 10 cases and in remaining, it is less than 0.6. These disordered oligopeptides contain high concentration of charged residues and nearly 20% glycines. We conclude that the apparent loss or delectability in crystals of large internal oligopeptide strings involve highly disordered domains which probably accounts for the difficulty crystallographers face in designating a signature domain to the missing internal string. In fact, since these strings are not actually absent in polypeptides, the inability to detect leads to an incomplete crystal structure. Clearly, in most cases the problem can be solved by comparing the in silico folded amino acid sequences of mature proteins to those derived from crystals.
Finally, one must consider the structural and functional relevance of the apparently missing segments. To that effect we are now assessing the propensity of various Triads involved in defining functionally important sites for enzyme-substrate interactions as well as other protein: protein binding. Another possible approach is to examine the missing oligopeptides alone and with flanking regions in Ramachandran plots. The question, therefore, remains as to how one should solve the crystal structure beyond the offerings of crystallography. In any case, it is unlikely that proteins exist in crystalline form in vivo and probably exhibit a metastable state with variable mobility of flexible regions depending on the intracellular environment.
Indeed, polymeric structures, namely, micelles, membranes and globular proteins exhibit a hydrophobic core and hydrophilic exterior that are differentially sensitive to perturbations by osmotic pressure, ionic strength and temperature and exhibit differential movements such that the kinetic energy between the two domains is conserved.
None
The author declares that there is no conflict of interest regarding the publication of this article.
©2018 Kelkar, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.