Submit manuscript...
MOJ
eISSN: 2374-6920

Proteomics & Bioinformatics

Research Article Volume 5 Issue 3

Bioinformatic analysis of glycoside hydrolases in the proteomes of mesophilic and thermophilic Actinobacteria

Kip A Teegardin, Steven James, Ravi D Barabote

Department of Biological Sciences, University of Arkansas, USA

Correspondence: Ravi D. Barabote, Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA, Tel (479) 5752475, Fax (479) 5754010

Received: January 30, 2017 | Published: March 16, 2017

Citation: Teegardin KA, James S, Barabote RD. Bioinformatic analysis of glycoside hydrolases in the proteomes of mesophilic and thermophilic Actinobacteria. MOJ Proteomics Bioinform. 2017;5(3):75-81. DOI: 10.15406/mojpb.2017.05.00158

Download PDF

Abstrat

Petroleum reserves are rapidly depleting and alternative renewable sources of energy need to be developed to meet the energy demands of the planet. Lignocellulose has been recognized as a highly promising and renewable resource for the development of clean energy. Thermophilic microbes and thermostable enzymes are being sought for biological conversion of lignocellulose into biofuels. The phylum Actinobacteria includes several efficient cellulose-degrading microorganisms. Genomes of several Actinobacteria have been completely sequenced and deposited in public databases, which are a great resource for uncovering new enzymes and targets for biotechnology. We searched the predicted proteomes of 69 Actinobacteria for the homologs of 20 glycoside hydrolase families relevant to lignocellulose degradation and identified 589 glycoside hydrolase homologs. We analyzed (1) the distribution of the glycoside hydrolase homologs across mesophilic and thermophilic Actinobacteria (2), the domain architecture of cellulases (from GH5 and GH6 families) and xylanases (from GH10 and GH11 families) from mesophilic and thermophilic Actinobacteria, and (3) asymmetric amino acid substitutions between mesophilic and thermophilic glycoside hydrolases. Overall, our data provide new insights into the distribution of different glycoside hydrolases in Actinobacteria as well as into the thermostability features of cellulases and xylanases from Actinobacteria. Our findings provide a basis for genetic engineering of glycoside hydrolases as well as new targets for biotechnology.

Keywords: thermophiles, enzymes, cellulases, xylanases, biofuel, lignocellulose, genome, proteome

Abbreviations

GH, glycoside hydrolases; CBM, carbohydrate binding module; OGT, optimal growth temperature

Introduction

Petroleum fuels are finite and non-renewable and they pose a significant concern for global climate, sustainability, and international security.1 Alternative renewable sources of energy are urgently needed to meet the current global challenges. Plants are the most abundant source of renewable carbon on Earth. Plant cell wall (lignocellulose) can be used for the production of renewable, sustainable, and environmentally -clean biofuels.2 Lignocellulose is mainly composed of polymers of sugars (cellulose and hemicellulose) and phenolic units (lignin). While complex lignocellulose can be converted into liquid fuels thermo-chemically, biological transformation of lignocellulosic polysaccharides using microorganisms and microbial enzymes is an economical and environmentally benign process for sustainable production of biofuels.3,4 Several microorganisms produce glycoside hydrolase enzymes such as cellulases and xylanases that break down cellulose and xylan (hemicellulose), respectively.5 Efficient lignocellulose-degrading microorganisms and catalytically- superior cellulases and xylanases are of very high value in the bioconversion of lignocellulose into biofuels.6,7

Actinobacteria are a phylum of Gram-positive bacteria that are found abundantly in soil.8 They include some of the most prolific lignocellulose-degrading bacteria.9 Actinobacteria include both mesophilic and thermophilic members. Many new Actinobacteria continue to be isolated and sequenced in bioprospecting studies aimed at identifying new biotechnological targets.10 Growing number of completely sequenced genomes are being steadily deposited in public databases, which provide an expanding resource for discovering novel targets for biotechnology. Systematic bioinformatic mining of the genomes and predicted proteomes of sequenced Actinobacteria has the potential to reveal novel insights into lignocellulose-degrading enzymes for bioenergy applications.11

Thermophilic microbes and thermostable enzymes are most useful for the development of cost- effective, industrial scale technologies.12 Thermostability of enzymes increases their shelf life, reduces reaction times, improves industrial productivity, and lowers manufacturing costs.12 Thus, enzyme thermostability is a highly desirable property for industrial enzymatic deconstruction of lignocellulose. Valuable insights can be gleaned about factors that contribute to thermostability by performing comparative analysis of amino acid sequences of proteins from mesophilic and thermophilic organisms.13 Such insights can be exploited for designing and genetically engineering enhanced enzymes for industrial applications.

In this study, we systematically analyzed the predicted proteomes of 69 Actinobacteria for homologs of glycoside hydrolase enzymes that are relevant to lignocellulose degradation. We analyzed the distribution of the homologs across the phylum. We identified homologs from mesophilic and thermophilic Actinobacteria and analyzed their domain architecture to decipher thermophilic patterns. Finally, we analyzed the amino acid sequences of cellulases and xylanases from mesophilic and thermophilic Actinobacteria and identified asymmetric amino acid substitution patterns in the thermophilic enzymes.

Methodology

Predicted proteomes of known lignocellulose-degrading Actinobacteria were obtained from NCBI (ftp://ftp.ncbi.nlm.nih.gov/). Optimal growth temperature (OGT) information was obtained through literature. Organisms were classified as mesophilic (OGT<40°C) or thermophilic (OGT>40°C). Glycoside hydrolase families that contain lignocellulose degradation enzymes were identified from the CAZy database.14 Representative Actinobacterial sequences from the CAZy families were used to identify homologs in the proteomes of the Actinobacteria using BLAST.15 Domains in the glycoside hydrolase proteins were identified using the NCBI’s CDD- search tool.16 Amino acid substitutions between homologs of mesophilic and thermophilic Actinobacteria were identified using multiple alignments as described previously.17 Briefly, for each GH family, orthologs from mesophilic and thermophilic Actinobacteria were aligned using CLUSTAL.18,19 Each substitution was counted only once per position in the alignment. For each amino acid substitution pair (e.g., AMBT and ATBM where A and B represent amino acids and the subscripts M and T represent mesophilic and thermophilic organisms, respectively), the total number of substitutions over the entire alignment was summed and the percentage of each substitution within the pair was calculated. Statistical significance (p-value) of asymmetric amino acid substitutions between the two groups of organisms was calculated using a binomial function. The asymmetry (i.e., bias) in AMBT and ATBM substitutions was considered significant if their p-value was below the threshold.

Results and discussion

Distribution of glycoside hydrolases in Actinobacteria.

We identified a total of 1133 Actinobacteria in the NCBI database. Of these, genomes of only 236 (21%) Actinobacteria have been completed sequenced. Within the 236 sequenced Actinobacteria, we identified 69 (29%) organisms that have been described in literature to have cellulolytic activity. We analyzed the predicted proteomes of the 69 Actinobacteria for the presence of glycoside hydrolases relevant to lignocellulose degradation. In addition, we collected information on their optimal growth temperature for each organism from literature. Using the CAZy database, we identified 20 glycoside hydrolase families that contain enzymes known to hydrolyze various plant cell wall polysaccharides.14 A total of 589 glycoside hydrolase homologs were identified in the proteomes of the 69 Actinobacteria (Table 1). Of the 69 Actinobacteria,”. 61 organisms are mesophilic and only 8 are thermophilic. This highlights the need to sequence more thermophilic Actinobacteria.

Organism Name

OGT (°C)

# of GH families

Total GH homologs

Acidothermus cellulolyticus 11B ATCC 43068

55

7

12

Actinosynnema mirum DSM 43827

28

9

30

Amycolatopsis mediterranei S699

26

13

37

Amycolatopsis mediterranei U32

26

12

33

Bifidobacterium adolescentis 15703

50

3

4

Bifidobacterium animalis AD011

37

1

1

Bifidobacterium animalis ATCC 25527

37

1

1

Bifidobacterium animalis B420

37

1

1

Bifidobacterium animalis Bb12

37

2

2

Bifidobacterium animalis Bi-04

37

1

1

Bifidobacterium animalis Bi-07

37

1

2

Bifidobacterium animalis BLC1

37

1

1

Bifidobacterium animalis CNCM I-2494

37

2

3

Bifidobacterium animalis DSM 10140

37

1

1

Bifidobacterium animalis V9

37

1

1

Bifidobacterium bifidum PRL2010

37

1

2

Bifidobacterium bifidum S17

37

1

2

Bifidobacterium breve ACS-071-V-Sch8b

37

2

2

Bifidobacterium breve UCC2003

37

0

0

Bifidobacterium dentium Bd1

29

2

7

Bifidobacterium Longum 157F

34

2

6

Bifidobacterium Longum BBMN68

34

1

3

Bifidobacterium Longum DJO10A

34

1

2

Bifidobacterium Longum F8

34

1

1

Bifidobacterium Longum JCM 1217

34

2

5

Bifidobacterium Longum JCM 1222 ,ATCC 15697

34

2

3

Bifidobacterium Longum JDM301

34

2

5

Bifidobacterium Longum KACC 91563

34

2

4

Bifidobacterium Longum NCC2705

34

2

5

Cellulomonas fimi ATCC 484

40

11

34

Cellulomonas flavigena DSM 20109

30

9

37

Cellvibrio gilvus ATCC 13127

25

9

20

Clavibacter michiganensis NCPPB 382

37

3

7

Clavibacter michiganensis sepedonicus

37

1

2

Jonesia denitrificans DSM 20603

37

9

13

Micrococcus luteus

37

0

0

Micromonospora aurantiaca ATCC 27029

27

10

22

Modestobacter marinus BC501

28

0

0

Mycobacterium abscessus

30

2

2

Mycobacterium avium 104

37

3

3

Mycobacterium avium K-10

37

3

3

Mycobacterium bovis AF2122/97

35

4

5

Mycobacterium bovis BCG str. Mexico

35

3

3

Mycobacterium bovis Pasteur 1173P2

35

3

3

Mycobacterium bovis Tokyo 172

35

3

3

Mycobacterium gilvum PYR-GCK

30

3

3

Mycobacterium marinum

37

2

2

Mycobacterium smegmatis MC2 155

30

3

4

Rhodococcus erythropolis PR4 PR4 (= NBRC 100887)

25

2

2

Rhodococcus opacus B4

27

2

2

Saccharomonospora glauca

45

0

0

Saccharomonospora viridis DSM 43017

55

2

2

Streptomyces avermitilis MA-4680

32

9

19

Streptomyces bingchenggensis BCW-1

28

12

47

Streptomyces cattleya DSM 46488,8057

34

4

12

Streptomyces clavuligerus

28

0

0

Streptomyces coelicolor A3(2)

28

11

20

Streptomyces flavogriseus ATCC 33331

28

11

19

Streptomyces hygroscopicus jinggangensis 5008

35

9

13

Streptomyces pristinaespiralis

28

0

0

Streptomyces scabiei 87.22

27

10

27

Streptomyces sirex AA3

28

8

9

Streptomyces sviceus

28

1

1

Streptomyces violaceusniger Tu 4113

28

10

22

Streptosporangium roseum DSM 43021

28

9

14

Thermobifida fusca YX

55

8

12

Thermobispora bispora 43833

55

8

10

Thermomonospora curvata 43183

55

2

3

Xylanimonas cellulosilytica DSM 15894

30

7

12

Table 1 Summary of the analysis of Actinobacteria used in this study

OGT (°C): optimal growth temperature (degrees Celsius); GH: glycoside hydrolase

We analyzed the relationship between optimal growth temperature and glycoside hydrolases encoded in the proteomes of the Actinobacteria (Figure 1). In general, there was very poor correlation (R2<0.1) between optimal growth temperature and glycoside hydrolase content of the proteomes. However, this may be partly due to the overrepresentation of mesophilic Actinobacteria in the dataset. The 61 mesophilic Actinobacteria encoded between 0 and 13 glycoside hydrolase families with an average of 4.0±3.9, while they encoded between 0 and 47 homologs of glycoside hydrolases with an average of 8.4±10.9. The 8 thermophilic Actinobacteria encoded between 0 and 11 glycoside hydrolase families with an average of 5.1±3.9, while they encoded between 0 and 34 homologs of glycoside hydrolases with an average of 9.6±10.9. There were no statistically significant differences in the distribution of glycoside hydrolases between mesophilic and thermophilic Actinobacteria. However, substantially greater numbers of thermophilic Actinobacteria need to be sequenced before deciphering any underlying biases between the two groups of Actinobacteria.

Figure 1 Relationship between optimal growth temperature and glycoside hydrolases in Actinobacteria.
(A) Scatter plot of number of glycoside hydrolases (GH) families versus optimal growth temperature.
(B) Scatter plot of number GH homologs versus optimal growth temperature. Best-fit line with R-squared value is shown.

We analyzed relative abundances of the 20 glycoside hydrolase families across Actinobacteria (Figure 2). The data show that GH5 was the most highly represented family in the Actinobacteria. It was the only family that was found in majority (70%) of the organisms analyzed. The GH6 and GH43 families were the next most represented families and were found in 48% of the Actinobacteria. The GH5 family is known to contain cellulose- and hemicellulose-degrading enzymes, while the GH6 family contains cellulases and the GH43 family contains hemicellulases.14 The GH45, GH51, and GH128 families were not represented in any of the Actinobacteria in our dataset. Other GH families showed intermediate representation.

Figure 2 Relative abundance of glycoside hydrolase (GH) families in Actinobacteria. Percentage of
Actinobacteria containing homologs of the different GH families are plotted.

Domain architecture of glycoside hydrolases in Actinobacteria.

To minimize over-representation of mesophilic Actinobacteria in the dataset, we selected one representative species per genus and also retained saprophytic free-living bacteria while removing animal and human pathogens. This yielded a more balanced set of Actinobacteria (6 thermophiles and 8 mesophiles). We focused our analysis on four GH families - cellulases from GH5 and GH6 families and xylanases from GH10 and GH11 families. There were 113 glycoside hydrolases from the four families across the 14 Actinobacteria (Table 2). There were 77 homologs in the 8 mesophilic bacteria, and 36 homologs in the 6 thermophilic bacteria. Six organisms contained representatives from all four families, while five organisms contained representatives from only three families and two organisms contained homologs from just one family. We analyzed the domain architecture of the 113 glycoside hydrolases using the NCBI’s CDD-search tool.16 At least five different types of carbohydrate binding modules (CBMs - CBM-2, CBM-3, CBM-X2, CBM-9, and CBM-4-9) were found fused to the catalytic domains of glycoside hydrolases (Table 3). Further analysis revealed a bias in the presence and location of certain CBMs. For example, CBM-2 was found fused on the C -terminal side of the catalytic domain in all four glycoside hydrolase families, while it was found on the N -terminal side of the catalytic domain in GH5 and GH6 cellulases. CMB -3 was only found in homologs from thermophilic Actinobacteria, and it always occurred C-terminal to the catalytic hydrolase domain. CBM- 9 and CMB-4-9 were found attached to only GH10 xylanases. CBM-9 occurred C- terminal to the catalytic domain, while CBM-4-9 was found on the N-terminal side of the hydrolase domain. CBM-X2 was found only in GH5 hydrolases from mesophilic Actinobacteria and was found C-terminal to the hydrolase domain. These data suggest that there are positional constraints for CBM domains in glycoside hydrolases. Certain domains may be required for the functioning and stability of the enzymes, while others may be specific to the substrates hydrolyzed by the associated catalytic domains.

Organism name

OGT

GH5

GH6

GH10

GH11

(A) Mesophilic Actinobacteria

Actinosynnema mirum DSM 43827

28

5

3

4

1

Amycolatopsis mediterranei S699

26

4

2

7

1

Cellvibrio gilvus ATCC 13127

25

2

4

6

0

Jonesia denitrificans DSM 20603

37

0

2

4

1

Micromonospora aurantiaca ATCC 27029

27

4

2

4

1

Streptomyces coelicolor A3(2)

28

1

3

2

2

Streptosporangium roseum DSM 43021

28

1

3

1

0

Xylanimonas cellulosilytica DSM 15894

30

0

2

4

1

 

 

 

 

 

 

(B) Thermophilic Actinobacteria

Acidothermus cellulolyticus 11B

55

2

2

2

0

Cellulomonas fimi ATCC 484

40

0

6

8

1

Saccharomonospora viridis DSM 43017

55

0

0

1

0

Thermobifida fusca YX

55

2

2

2

1

Thermobispora bispora DSM 43833

55

1

2

2

1

Thermomonospora curvata DSM 43183

55

0

1

0

0

Table 2 Distribution of glycoside hydrolases in mesophilic and thermophilic Actinobacteria

 

C-Terminal domain

(A) Mesophiles

GH5

GH6

GH10

GH11

CBM_2

CBM_3

CBM_X2

CBM_9

CBM_4_9

No CBM

N-terminal domain

GH5

8

2

2

GH6

8

6

GH10

16

2

7

GH11

4

1

CBM_2

7

8

2

CBM_3

CBM_X2

CBM_9

CBM_4_9

4

1

2

 

 

C-Terminal Domain

(B) Thermophiles

GH5

GH6

GH10

GH11

CBM_2

CBM_3

CBM_X2

CBM_9

CBM_4_9

No CBM

N-terminal domain

GH5

3

2

GH6

5

1

3

GH10

8

1

2

3

GH11

3

CBM_2

1

1

CBM_3

CBM_X2

CBM_9

CBM_4_9

2

2

Table 3 Domain architecture of glycoside hydrolases in Actinobacteria

Asymmetric amino acid substitutions in glycoside hydrolases

We wanted to understand amino acid mesophilic Actinobacteria. This would help identify amino acid substitutions that may contribute to thermostability of glycoside hydrolases. For each glycoside hydrolase family, we aligned only the hydrolase domains of orthologs from mesophilic and thermophilic organisms identified earlier (Table 2). We calculated the frequencies of all amino acid substitutions between mesophilic and thermophilic homologs at every position and identified the statistically significant asymmetric amino acid substitutions (Table 4). The data revealed 41 pairs of amino acid substitutions that are asymmetric between the homologs from thermophilic and mesophilic Actinobacteria. Certain amino acid preferences in the thermophiles were specific to the glycoside hydrolase family, while other amino acid preferences were independent of the glycoside hydrolase family. For example, thermophilic enzymes from GH6, GH10, and GH11 families showed preferences for alanine over glycine. Similarly, thermophilic proteins showed preference for aspartate over thermolabile serine and threonine residues. There was also a biased preference for isoleucine over valine in thermostable homologs. Overall, the data provide several new targets for genetically engineering higher thermostability in glycoside hydrolases20 (Table 4).

Amino acid in the thermophilic homologs

A

D

E

F

H

I

K

L

N

P

Q

S

T

V

Y

Amino acid in the mesophilic homologs

A

C74

C62

A100

B100

C77

B100

C61, B68

B100

D

B92

B93

E

A62

F

C65

G

B88, C67, D86,

C100

B91

K

A86

A100

L

A62

N

B100

D91

D92

D84

D100

C100

R

C67

B90

C59

S

D86

B77, C67

C100

C64

D85

T

A80, C75

B100, C78

B100

B88

D100

V

B80

A60, C56

D77

Y

A100

Table 4 Asymmetric amino acid substitution patterns in glycoside hydrolases

Standard single letter amino acid code is use to represent amino acids. Data are represented with a letter followed by a number, where A represents GH5, B represents GH6, C represents GH10, D represents GH11, and numbers represent the percentage of occurrence of the particular substitution. Only statistically significant (p < 0.1) asymmetric substitutions are shown

Conclusion

We analyzed the predicted proteomes of 69 sequenced Actinobacteria and identified homologs of 20 glycoside hydrolase families associated with plant cell wall degradation. Some glycoside hydrolase families were well represented across the phylum, while a few families were not represented in any of the Actinobacteria we analyzed. The glycoside hydrolases appear to have a constrained domain architecture that likely determines their stability, functioning, and interaction with substrates. Certain carbohydrate binding modules found fused to the glycoside hydrolases were only associated with thermophilic Actinobacteria. Finally, glycoside hydrolases from thermophilic Actinobacteria showed preferences for certain amino acid substitutions over their mesophilic counterparts. Overall, our data provide new insights into glycoside hydrolases in Actinobacteria and provide a basis for genetically enhancing the stability of glycoside hydrolases towards industrial applications.

Acknowledgements

This research was supported by startup funds provided to RDB by the University of Arkansas."KAT acknowledges support from the National Science Foundation Research Experience for Undergraduates program through the University of Arkansas REU Site (DBI-1063067)”.

Conflict of interest

The author declares no conflict of interest.

References

  1. Solomon BD. Biofuels and sustainability. Ann N Y Acad Sci. 2010;1185:119–134.
  2. Pothiraj C, Kanmani P, Balaji P. Bioconversion of lignocellulose materials. Mycobiology. 2006;34(4):159–165.
  3. Peralta–Yahya PP, Keasling JD. Advanced biofuel production in microbes. Biotechnol J. 2010;5(2):147–162.
  4. Clark JH, Luque R, Matharu AS. Green chemistry, biofuels, and biorefinery. Annu Rev Chem Biomol Eng. 2012;3:183–207.
  5. Cragg SM, Beckham GT, Bruce NC, et al. Lignocellulose degradation mechanisms across the Tree of Life. Curr Opin Chem Biol. 2015;29:108–119.
  6. Blumer–Schuette SE, Brown SD, Sander KB, et al. Thermophilic lignocellulose deconstruction. FEMS Microbiol Rev. 2014;38(3):393–448.
  7. Bhalla A, Bansal N, Kumar S, et al. Improved lignocellulose conversion to biofuels with thermophilic bacteria and thermostable enzymes. Bioresour Technol. 2013;128:751–759.
  8. Ventura M, Canchaya C, Tauch A, et al. Genomics of Actinobacteria: tracing the evolutionary history of an ancient phylum. Microbiol Mol Biol Rev. 2007;71(3):495–548.
  9. McCarthy AJ, Williams ST. Actinomycetes as agents of biodegradation in the environment–a review. Gene. 1992;115(1–2):189–192.
  10. Ward AC, Bora N. Diversity and biogeography of marine Actinobacteria. Curr Opin Microbiol. 2006;9(3):279–286.
  11. Ohm RA, Riley R, Salamov A, et al. Genomics of wood–degrading fungi. Fungal Genet Biol. 2014;72:82–90.
  12. Bruins ME, Janssen AE, Boom RM. Thermozymes and their applications: a review of recent literature and patents. Appl Biochem Biotechnol. 2001;90(2):155–186.
  13. Kumar S, Nussinov R. How do thermophilic proteins deal with heat? Cell Mol Life Sci. 2001;58(9):1216–1233.
  14. Cantarel BL, Coutinho PM, Rancurel C, et al. The Carbohydrate–Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Res. 2009;37(Database issue):D233–D238.
  15. Altschul SF, Gish W, Miller W, et al. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410.
  16. Marchler–Bauer A, Derbyshire MK, Gonzales NR, et al. CDD:NCBI's conserved domain database. Nucleic Acids Res. 2015;43(Database issue):D222–226.
  17. Takami H, Takaki Y, Chee GJ, et al. Thermoadaptation trait revealed by the genome sequence of thermophilic Geobacillus kaustophilus. Nucleic Acids Res. 2004;32(21):6292–6303.
  18. Higgins, Desmond G, Sharp PM. CLUSTAL: A Package for Performing Multiple Sequence Alignment on a Microcomputer. Gene. 1988;73(1):237–244.
  19. Larkin MA, Blackshields G, Brown NP, et al. Clustal W and Clustal X Version 2.0. Bioinformatics. 2007;23(21):2947–2948.
  20. Gao SJ, Wang JQ, Wu MC, et al. Engineering hyperthermostability into a mesophilic family 11 xylanase from Aspergillus oryzae by in silico design of N–terminus substitution. Biotechnol Bioeng. 2013;110(4):1028–1038.
Creative Commons Attribution License

©2017 Teegardin, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.