Research Article Volume 5 Issue 3
Department of Biological Sciences, University of Arkansas, USA
Correspondence: Ravi D. Barabote, Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA, Tel (479) 5752475, Fax (479) 5754010
Received: January 30, 2017 | Published: March 16, 2017
Citation: Teegardin KA, James S, Barabote RD. Bioinformatic analysis of glycoside hydrolases in the proteomes of mesophilic and thermophilic Actinobacteria. MOJ Proteomics Bioinform. 2017;5(3):75-81. DOI: 10.15406/mojpb.2017.05.00158
Petroleum reserves are rapidly depleting and alternative renewable sources of energy need to be developed to meet the energy demands of the planet. Lignocellulose has been recognized as a highly promising and renewable resource for the development of clean energy. Thermophilic microbes and thermostable enzymes are being sought for biological conversion of lignocellulose into biofuels. The phylum Actinobacteria includes several efficient cellulose-degrading microorganisms. Genomes of several Actinobacteria have been completely sequenced and deposited in public databases, which are a great resource for uncovering new enzymes and targets for biotechnology. We searched the predicted proteomes of 69 Actinobacteria for the homologs of 20 glycoside hydrolase families relevant to lignocellulose degradation and identified 589 glycoside hydrolase homologs. We analyzed (1) the distribution of the glycoside hydrolase homologs across mesophilic and thermophilic Actinobacteria (2), the domain architecture of cellulases (from GH5 and GH6 families) and xylanases (from GH10 and GH11 families) from mesophilic and thermophilic Actinobacteria, and (3) asymmetric amino acid substitutions between mesophilic and thermophilic glycoside hydrolases. Overall, our data provide new insights into the distribution of different glycoside hydrolases in Actinobacteria as well as into the thermostability features of cellulases and xylanases from Actinobacteria. Our findings provide a basis for genetic engineering of glycoside hydrolases as well as new targets for biotechnology.
Keywords: thermophiles, enzymes, cellulases, xylanases, biofuel, lignocellulose, genome, proteome
GH, glycoside hydrolases; CBM, carbohydrate binding module; OGT, optimal growth temperature
Petroleum fuels are finite and non-renewable and they pose a significant concern for global climate, sustainability, and international security.1 Alternative renewable sources of energy are urgently needed to meet the current global challenges. Plants are the most abundant source of renewable carbon on Earth. Plant cell wall (lignocellulose) can be used for the production of renewable, sustainable, and environmentally -clean biofuels.2 Lignocellulose is mainly composed of polymers of sugars (cellulose and hemicellulose) and phenolic units (lignin). While complex lignocellulose can be converted into liquid fuels thermo-chemically, biological transformation of lignocellulosic polysaccharides using microorganisms and microbial enzymes is an economical and environmentally benign process for sustainable production of biofuels.3,4 Several microorganisms produce glycoside hydrolase enzymes such as cellulases and xylanases that break down cellulose and xylan (hemicellulose), respectively.5 Efficient lignocellulose-degrading microorganisms and catalytically- superior cellulases and xylanases are of very high value in the bioconversion of lignocellulose into biofuels.6,7
Actinobacteria are a phylum of Gram-positive bacteria that are found abundantly in soil.8 They include some of the most prolific lignocellulose-degrading bacteria.9 Actinobacteria include both mesophilic and thermophilic members. Many new Actinobacteria continue to be isolated and sequenced in bioprospecting studies aimed at identifying new biotechnological targets.10 Growing number of completely sequenced genomes are being steadily deposited in public databases, which provide an expanding resource for discovering novel targets for biotechnology. Systematic bioinformatic mining of the genomes and predicted proteomes of sequenced Actinobacteria has the potential to reveal novel insights into lignocellulose-degrading enzymes for bioenergy applications.11
Thermophilic microbes and thermostable enzymes are most useful for the development of cost- effective, industrial scale technologies.12 Thermostability of enzymes increases their shelf life, reduces reaction times, improves industrial productivity, and lowers manufacturing costs.12 Thus, enzyme thermostability is a highly desirable property for industrial enzymatic deconstruction of lignocellulose. Valuable insights can be gleaned about factors that contribute to thermostability by performing comparative analysis of amino acid sequences of proteins from mesophilic and thermophilic organisms.13 Such insights can be exploited for designing and genetically engineering enhanced enzymes for industrial applications.
In this study, we systematically analyzed the predicted proteomes of 69 Actinobacteria for homologs of glycoside hydrolase enzymes that are relevant to lignocellulose degradation. We analyzed the distribution of the homologs across the phylum. We identified homologs from mesophilic and thermophilic Actinobacteria and analyzed their domain architecture to decipher thermophilic patterns. Finally, we analyzed the amino acid sequences of cellulases and xylanases from mesophilic and thermophilic Actinobacteria and identified asymmetric amino acid substitution patterns in the thermophilic enzymes.
Predicted proteomes of known lignocellulose-degrading Actinobacteria were obtained from NCBI (ftp://ftp.ncbi.nlm.nih.gov/). Optimal growth temperature (OGT) information was obtained through literature. Organisms were classified as mesophilic (OGT<40°C) or thermophilic (OGT>40°C). Glycoside hydrolase families that contain lignocellulose degradation enzymes were identified from the CAZy database.14 Representative Actinobacterial sequences from the CAZy families were used to identify homologs in the proteomes of the Actinobacteria using BLAST.15 Domains in the glycoside hydrolase proteins were identified using the NCBI’s CDD- search tool.16 Amino acid substitutions between homologs of mesophilic and thermophilic Actinobacteria were identified using multiple alignments as described previously.17 Briefly, for each GH family, orthologs from mesophilic and thermophilic Actinobacteria were aligned using CLUSTAL.18,19 Each substitution was counted only once per position in the alignment. For each amino acid substitution pair (e.g., AMBT and ATBM where A and B represent amino acids and the subscripts M and T represent mesophilic and thermophilic organisms, respectively), the total number of substitutions over the entire alignment was summed and the percentage of each substitution within the pair was calculated. Statistical significance (p-value) of asymmetric amino acid substitutions between the two groups of organisms was calculated using a binomial function. The asymmetry (i.e., bias) in AMBT and ATBM substitutions was considered significant if their p-value was below the threshold.
Distribution of glycoside hydrolases in Actinobacteria.
We identified a total of 1133 Actinobacteria in the NCBI database. Of these, genomes of only 236 (21%) Actinobacteria have been completed sequenced. Within the 236 sequenced Actinobacteria, we identified 69 (29%) organisms that have been described in literature to have cellulolytic activity. We analyzed the predicted proteomes of the 69 Actinobacteria for the presence of glycoside hydrolases relevant to lignocellulose degradation. In addition, we collected information on their optimal growth temperature for each organism from literature. Using the CAZy database, we identified 20 glycoside hydrolase families that contain enzymes known to hydrolyze various plant cell wall polysaccharides.14 A total of 589 glycoside hydrolase homologs were identified in the proteomes of the 69 Actinobacteria (Table 1). Of the 69 Actinobacteria,”. 61 organisms are mesophilic and only 8 are thermophilic. This highlights the need to sequence more thermophilic Actinobacteria.
Organism Name |
OGT (°C) |
# of GH families |
Total GH homologs |
Acidothermus cellulolyticus 11B ATCC 43068 |
55 |
7 |
12 |
Actinosynnema mirum DSM 43827 |
28 |
9 |
30 |
Amycolatopsis mediterranei S699 |
26 |
13 |
37 |
Amycolatopsis mediterranei U32 |
26 |
12 |
33 |
Bifidobacterium adolescentis 15703 |
50 |
3 |
4 |
Bifidobacterium animalis AD011 |
37 |
1 |
1 |
Bifidobacterium animalis ATCC 25527 |
37 |
1 |
1 |
Bifidobacterium animalis B420 |
37 |
1 |
1 |
Bifidobacterium animalis Bb12 |
37 |
2 |
2 |
Bifidobacterium animalis Bi-04 |
37 |
1 |
1 |
Bifidobacterium animalis Bi-07 |
37 |
1 |
2 |
Bifidobacterium animalis BLC1 |
37 |
1 |
1 |
Bifidobacterium animalis CNCM I-2494 |
37 |
2 |
3 |
Bifidobacterium animalis DSM 10140 |
37 |
1 |
1 |
Bifidobacterium animalis V9 |
37 |
1 |
1 |
Bifidobacterium bifidum PRL2010 |
37 |
1 |
2 |
Bifidobacterium bifidum S17 |
37 |
1 |
2 |
Bifidobacterium breve ACS-071-V-Sch8b |
37 |
2 |
2 |
Bifidobacterium breve UCC2003 |
37 |
0 |
0 |
Bifidobacterium dentium Bd1 |
29 |
2 |
7 |
Bifidobacterium Longum 157F |
34 |
2 |
6 |
Bifidobacterium Longum BBMN68 |
34 |
1 |
3 |
Bifidobacterium Longum DJO10A |
34 |
1 |
2 |
Bifidobacterium Longum F8 |
34 |
1 |
1 |
Bifidobacterium Longum JCM 1217 |
34 |
2 |
5 |
Bifidobacterium Longum JCM 1222 ,ATCC 15697 |
34 |
2 |
3 |
Bifidobacterium Longum JDM301 |
34 |
2 |
5 |
Bifidobacterium Longum KACC 91563 |
34 |
2 |
4 |
Bifidobacterium Longum NCC2705 |
34 |
2 |
5 |
Cellulomonas fimi ATCC 484 |
40 |
11 |
34 |
Cellulomonas flavigena DSM 20109 |
30 |
9 |
37 |
Cellvibrio gilvus ATCC 13127 |
25 |
9 |
20 |
Clavibacter michiganensis NCPPB 382 |
37 |
3 |
7 |
Clavibacter michiganensis sepedonicus |
37 |
1 |
2 |
Jonesia denitrificans DSM 20603 |
37 |
9 |
13 |
Micrococcus luteus |
37 |
0 |
0 |
Micromonospora aurantiaca ATCC 27029 |
27 |
10 |
22 |
Modestobacter marinus BC501 |
28 |
0 |
0 |
Mycobacterium abscessus |
30 |
2 |
2 |
Mycobacterium avium 104 |
37 |
3 |
3 |
Mycobacterium avium K-10 |
37 |
3 |
3 |
Mycobacterium bovis AF2122/97 |
35 |
4 |
5 |
Mycobacterium bovis BCG str. Mexico |
35 |
3 |
3 |
Mycobacterium bovis Pasteur 1173P2 |
35 |
3 |
3 |
Mycobacterium bovis Tokyo 172 |
35 |
3 |
3 |
Mycobacterium gilvum PYR-GCK |
30 |
3 |
3 |
Mycobacterium marinum |
37 |
2 |
2 |
Mycobacterium smegmatis MC2 155 |
30 |
3 |
4 |
Rhodococcus erythropolis PR4 PR4 (= NBRC 100887) |
25 |
2 |
2 |
Rhodococcus opacus B4 |
27 |
2 |
2 |
Saccharomonospora glauca |
45 |
0 |
0 |
Saccharomonospora viridis DSM 43017 |
55 |
2 |
2 |
Streptomyces avermitilis MA-4680 |
32 |
9 |
19 |
Streptomyces bingchenggensis BCW-1 |
28 |
12 |
47 |
Streptomyces cattleya DSM 46488,8057 |
34 |
4 |
12 |
Streptomyces clavuligerus |
28 |
0 |
0 |
Streptomyces coelicolor A3(2) |
28 |
11 |
20 |
Streptomyces flavogriseus ATCC 33331 |
28 |
11 |
19 |
Streptomyces hygroscopicus jinggangensis 5008 |
35 |
9 |
13 |
Streptomyces pristinaespiralis |
28 |
0 |
0 |
Streptomyces scabiei 87.22 |
27 |
10 |
27 |
Streptomyces sirex AA3 |
28 |
8 |
9 |
Streptomyces sviceus |
28 |
1 |
1 |
Streptomyces violaceusniger Tu 4113 |
28 |
10 |
22 |
Streptosporangium roseum DSM 43021 |
28 |
9 |
14 |
Thermobifida fusca YX |
55 |
8 |
12 |
Thermobispora bispora 43833 |
55 |
8 |
10 |
Thermomonospora curvata 43183 |
55 |
2 |
3 |
Xylanimonas cellulosilytica DSM 15894 |
30 |
7 |
12 |
Table 1 Summary of the analysis of Actinobacteria used in this study
OGT (°C): optimal growth temperature (degrees Celsius); GH: glycoside hydrolase
We analyzed the relationship between optimal growth temperature and glycoside hydrolases encoded in the proteomes of the Actinobacteria (Figure 1). In general, there was very poor correlation (R2<0.1) between optimal growth temperature and glycoside hydrolase content of the proteomes. However, this may be partly due to the overrepresentation of mesophilic Actinobacteria in the dataset. The 61 mesophilic Actinobacteria encoded between 0 and 13 glycoside hydrolase families with an average of 4.0±3.9, while they encoded between 0 and 47 homologs of glycoside hydrolases with an average of 8.4±10.9. The 8 thermophilic Actinobacteria encoded between 0 and 11 glycoside hydrolase families with an average of 5.1±3.9, while they encoded between 0 and 34 homologs of glycoside hydrolases with an average of 9.6±10.9. There were no statistically significant differences in the distribution of glycoside hydrolases between mesophilic and thermophilic Actinobacteria. However, substantially greater numbers of thermophilic Actinobacteria need to be sequenced before deciphering any underlying biases between the two groups of Actinobacteria.
Figure 1 Relationship between optimal growth temperature and glycoside hydrolases in Actinobacteria.
(A) Scatter plot of number of glycoside hydrolases (GH) families versus optimal growth temperature.
(B) Scatter plot of number GH homologs versus optimal growth temperature. Best-fit line with R-squared value is shown.
We analyzed relative abundances of the 20 glycoside hydrolase families across Actinobacteria (Figure 2). The data show that GH5 was the most highly represented family in the Actinobacteria. It was the only family that was found in majority (70%) of the organisms analyzed. The GH6 and GH43 families were the next most represented families and were found in 48% of the Actinobacteria. The GH5 family is known to contain cellulose- and hemicellulose-degrading enzymes, while the GH6 family contains cellulases and the GH43 family contains hemicellulases.14 The GH45, GH51, and GH128 families were not represented in any of the Actinobacteria in our dataset. Other GH families showed intermediate representation.
Figure 2 Relative abundance of glycoside hydrolase (GH) families in Actinobacteria. Percentage of
Actinobacteria containing homologs of the different GH families are plotted.
Domain architecture of glycoside hydrolases in Actinobacteria.
To minimize over-representation of mesophilic Actinobacteria in the dataset, we selected one representative species per genus and also retained saprophytic free-living bacteria while removing animal and human pathogens. This yielded a more balanced set of Actinobacteria (6 thermophiles and 8 mesophiles). We focused our analysis on four GH families - cellulases from GH5 and GH6 families and xylanases from GH10 and GH11 families. There were 113 glycoside hydrolases from the four families across the 14 Actinobacteria (Table 2). There were 77 homologs in the 8 mesophilic bacteria, and 36 homologs in the 6 thermophilic bacteria. Six organisms contained representatives from all four families, while five organisms contained representatives from only three families and two organisms contained homologs from just one family. We analyzed the domain architecture of the 113 glycoside hydrolases using the NCBI’s CDD-search tool.16 At least five different types of carbohydrate binding modules (CBMs - CBM-2, CBM-3, CBM-X2, CBM-9, and CBM-4-9) were found fused to the catalytic domains of glycoside hydrolases (Table 3). Further analysis revealed a bias in the presence and location of certain CBMs. For example, CBM-2 was found fused on the C -terminal side of the catalytic domain in all four glycoside hydrolase families, while it was found on the N -terminal side of the catalytic domain in GH5 and GH6 cellulases. CMB -3 was only found in homologs from thermophilic Actinobacteria, and it always occurred C-terminal to the catalytic hydrolase domain. CBM- 9 and CMB-4-9 were found attached to only GH10 xylanases. CBM-9 occurred C- terminal to the catalytic domain, while CBM-4-9 was found on the N-terminal side of the hydrolase domain. CBM-X2 was found only in GH5 hydrolases from mesophilic Actinobacteria and was found C-terminal to the hydrolase domain. These data suggest that there are positional constraints for CBM domains in glycoside hydrolases. Certain domains may be required for the functioning and stability of the enzymes, while others may be specific to the substrates hydrolyzed by the associated catalytic domains.
Organism name |
OGT |
GH5 |
GH6 |
GH10 |
GH11 |
(A) Mesophilic Actinobacteria |
|||||
Actinosynnema mirum DSM 43827 |
28 |
5 |
3 |
4 |
1 |
Amycolatopsis mediterranei S699 |
26 |
4 |
2 |
7 |
1 |
Cellvibrio gilvus ATCC 13127 |
25 |
2 |
4 |
6 |
0 |
Jonesia denitrificans DSM 20603 |
37 |
0 |
2 |
4 |
1 |
Micromonospora aurantiaca ATCC 27029 |
27 |
4 |
2 |
4 |
1 |
Streptomyces coelicolor A3(2) |
28 |
1 |
3 |
2 |
2 |
Streptosporangium roseum DSM 43021 |
28 |
1 |
3 |
1 |
0 |
Xylanimonas cellulosilytica DSM 15894 |
30 |
0 |
2 |
4 |
1 |
|
|
|
|
|
|
(B) Thermophilic Actinobacteria |
|||||
Acidothermus cellulolyticus 11B |
55 |
2 |
2 |
2 |
0 |
Cellulomonas fimi ATCC 484 |
40 |
0 |
6 |
8 |
1 |
Saccharomonospora viridis DSM 43017 |
55 |
0 |
0 |
1 |
0 |
Thermobifida fusca YX |
55 |
2 |
2 |
2 |
1 |
Thermobispora bispora DSM 43833 |
55 |
1 |
2 |
2 |
1 |
Thermomonospora curvata DSM 43183 |
55 |
0 |
1 |
0 |
0 |
Table 2 Distribution of glycoside hydrolases in mesophilic and thermophilic Actinobacteria
|
C-Terminal domain |
||||||||||
(A) Mesophiles |
GH5 |
GH6 |
GH10 |
GH11 |
CBM_2 |
CBM_3 |
CBM_X2 |
CBM_9 |
CBM_4_9 |
No CBM |
|
N-terminal domain |
GH5 |
8 |
2 |
2 |
|||||||
GH6 |
8 |
6 |
|||||||||
GH10 |
16 |
2 |
7 |
||||||||
GH11 |
4 |
1 |
|||||||||
CBM_2 |
7 |
8 |
2 |
||||||||
CBM_3 |
|||||||||||
CBM_X2 |
|||||||||||
CBM_9 |
|||||||||||
CBM_4_9 |
4 |
1 |
2 |
||||||||
|
|
C-Terminal Domain |
|||||||||
(B) Thermophiles |
GH5 |
GH6 |
GH10 |
GH11 |
CBM_2 |
CBM_3 |
CBM_X2 |
CBM_9 |
CBM_4_9 |
No CBM |
|
N-terminal domain |
GH5 |
3 |
2 |
||||||||
GH6 |
5 |
1 |
3 |
||||||||
GH10 |
8 |
1 |
2 |
3 |
|||||||
GH11 |
3 |
||||||||||
CBM_2 |
1 |
1 |
|||||||||
CBM_3 |
|||||||||||
CBM_X2 |
|||||||||||
CBM_9 |
|||||||||||
CBM_4_9 |
2 |
2 |
Table 3 Domain architecture of glycoside hydrolases in Actinobacteria
Asymmetric amino acid substitutions in glycoside hydrolases
We wanted to understand amino acid mesophilic Actinobacteria. This would help identify amino acid substitutions that may contribute to thermostability of glycoside hydrolases. For each glycoside hydrolase family, we aligned only the hydrolase domains of orthologs from mesophilic and thermophilic organisms identified earlier (Table 2). We calculated the frequencies of all amino acid substitutions between mesophilic and thermophilic homologs at every position and identified the statistically significant asymmetric amino acid substitutions (Table 4). The data revealed 41 pairs of amino acid substitutions that are asymmetric between the homologs from thermophilic and mesophilic Actinobacteria. Certain amino acid preferences in the thermophiles were specific to the glycoside hydrolase family, while other amino acid preferences were independent of the glycoside hydrolase family. For example, thermophilic enzymes from GH6, GH10, and GH11 families showed preferences for alanine over glycine. Similarly, thermophilic proteins showed preference for aspartate over thermolabile serine and threonine residues. There was also a biased preference for isoleucine over valine in thermostable homologs. Overall, the data provide several new targets for genetically engineering higher thermostability in glycoside hydrolases20 (Table 4).
Amino acid in the thermophilic homologs |
||||||||||||||||
A |
D |
E |
F |
H |
I |
K |
L |
N |
P |
Q |
S |
T |
V |
Y |
||
Amino acid in the mesophilic homologs |
A |
C74 |
C62 |
A100 |
B100 |
C77 |
B100 |
C61, B68 |
B100 |
|||||||
D |
B92 |
B93 |
||||||||||||||
E |
A62 |
|||||||||||||||
F |
C65 |
|||||||||||||||
G |
B88, C67, D86, |
C100 |
B91 |
|||||||||||||
K |
A86 |
A100 |
||||||||||||||
L |
A62 |
|||||||||||||||
N |
B100 |
D91 |
D92 |
D84 |
D100 |
C100 |
||||||||||
R |
C67 |
B90 |
C59 |
|||||||||||||
S |
D86 |
B77, C67 |
C100 |
C64 |
D85 |
|||||||||||
T |
A80, C75 |
B100, C78 |
B100 |
B88 |
D100 |
|||||||||||
V |
B80 |
A60, C56 |
D77 |
|||||||||||||
Y |
A100 |
Table 4 Asymmetric amino acid substitution patterns in glycoside hydrolases
Standard single letter amino acid code is use to represent amino acids. Data are represented with a letter followed by a number, where A represents GH5, B represents GH6, C represents GH10, D represents GH11, and numbers represent the percentage of occurrence of the particular substitution. Only statistically significant (p < 0.1) asymmetric substitutions are shown
We analyzed the predicted proteomes of 69 sequenced Actinobacteria and identified homologs of 20 glycoside hydrolase families associated with plant cell wall degradation. Some glycoside hydrolase families were well represented across the phylum, while a few families were not represented in any of the Actinobacteria we analyzed. The glycoside hydrolases appear to have a constrained domain architecture that likely determines their stability, functioning, and interaction with substrates. Certain carbohydrate binding modules found fused to the glycoside hydrolases were only associated with thermophilic Actinobacteria. Finally, glycoside hydrolases from thermophilic Actinobacteria showed preferences for certain amino acid substitutions over their mesophilic counterparts. Overall, our data provide new insights into glycoside hydrolases in Actinobacteria and provide a basis for genetically enhancing the stability of glycoside hydrolases towards industrial applications.
This research was supported by startup funds provided to RDB by the University of Arkansas."KAT acknowledges support from the National Science Foundation Research Experience for Undergraduates program through the University of Arkansas REU Site (DBI-1063067)”.
The author declares no conflict of interest.
©2017 Teegardin, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.