Immunoinformatic analysis of proteins from DNA replication, repair, recombination, and restriction/modification pathway of Mycobacterium tuberculosis revealed the diagnostic potential of Rv0054 and Rv3644c

doi:10.15406/jabb.2022.09.00309

Journal of

eISSN: 2572-8466

Applied Biotechnology & Bioengineering

Research Article Volume 9 Issue 5

Immunoinformatic analysis of proteins from DNA replication, repair, recombination, and restriction/modification pathway of Mycobacterium tuberculosis revealed the diagnostic potential of Rv0054 and Rv3644c

Vikas Jha,¹

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

Sathi Maiti,² Dattatray Sawant,¹ Darpan Kaur,¹ Sankalp Kasbe,¹ Abhishek Kumar,¹ Badal Saiya,¹ Shloka Shukla,¹ Simeen Rumani,¹ Mrunmayi Markam¹

¹National Facility for Biopharmaceuticals, G. N. Khalsa College, India
²Department of Five Years Integrated Course in Bioanalytical Sciences, GNIRD, G.N. Khalsa College, India

Correspondence: Vikas Jha, National Facility for Biopharmaceuticals, G. N. Khalsa College, Matunga, Mumbai, Maharashtra, India

Received: October 15, 2022 | Published: October 27, 2022

Citation: Vikas J, Sathi M, Dattatray S, et al. Immunoinformatic analysis of proteins from DNA replication, repair, recombination, and restriction/modification pathway of Mycobacterium tuberculosis revealed the diagnostic potential of Rv0054 and Rv3644c. J Appl Biotechnol Bioeng. 2022;9(5):190-201. DOI: 10.15406/jabb.2022.09.00309

Download PDF

Abstract

Mycobacterium tuberculosis being a causative agent of tuberculosis is a powerful pathogen that has evolved to survive within the host. There are certain metabolic pathways that play a vital role in host-pathogen interaction, pathogenicity and virulence which is indicated by the pathophysiology of Mycobacterium tuberculosis (MTB). The pathways involve many proteins that are vital for MTB survival in the host. One such pathway is DNA replication, repair, recombination, and restriction/modification pathway. The study of DNA repair mechanisms in Mycobacterium tuberculosis has progressed more slowly than in other bacteria due to the technological challenges in dealing with a slow-growing pathogen. In this study, by utilizing immunoinformatic analysis & homology modelling approach, the evaluation of the proteins involved in this pathway was carried out which can lead to the discovery of potential drug targets, vaccine candidates as well as various diagnostic markers.

Keywords:In-silco, Mycobacterium tuberculosis, homology modelling, diagnostic markers, vaccine candidates

Abbreviations

ACC, auto cross-covariance; CXC, colorectal cancer; MTB, Mycobacterium tuberculosis; MDR-TB, multidrug-resistant tuberculosis; INH, isoniazid; RFP, rifapim; RNI, Reactive nitrogen intermediates; GRAVY, grand average of hydropathicity index; HMM, hidden markov model; INF, interferon; IP-10, IFN-gamma-inducible protein 10; Th1, type 1 T helper

Introduction

Tuberculosis (TB) is a highly infectious disease caused by Mycobacterium tuberculosis (MTB) that has posed a constant threat throughout human history due to its severe potential implications. The genus Mycobacterium is believed to have originated more than 150 million years ago by Bazin. Mycobacterium tuberculosis is one of the leading infectious agents in most cases of tuberculosis occurring worldwide. MTB having a very ancient origin has survived for over 70,000 years and is currently infecting nearly 5.8 million people around the globe; with new cases of TB each year, nearly one-third of the world's population are carriers of the TB bacillus and are at a chance for creating the active infection at a global scale by WHO. Because of its infectious nature, vigorous immunological response, slow progressive development, and the need for long-term care combating and preventing tuberculosis have been a persisting problem in epidemic history, with the prevalence of multi-drug resistant forms, including its major social repercussions. Multidrug-resistant tuberculosis (MDR-TB), produced by isolates of MTB are resistant to at least two of the most effective anti-TB medications, isoniazid (INH) and rifapim (RFP), hence, still posing complications for TB eradication by Lin and Flynn.

With the continued growth of multi- and extensively resistant MTB strains, undermining the administration of this major catastrophe, new MTB treatments are urgently needed, and metabolic pathways present enticing and potentially powerful targets. There are certain metabolic pathways that play an essential role in host-pathogen interaction, pathogenicity and virulence which is indicated by the pathophysiology of Mycobacterium tuberculosis (MTB). The pathways involve a large number of proteins that are vital for MTB survival in the host.¹ Further exploration of various proteins from these pathways, may aid in the development of vaccine candidates, identification of newer drug targets, and even diagnostic markers. One such pathway is DNA replication, repair, recombination and restriction/modification pathway.² MTB encodes a complex set of proteins that guarantees chromosomal DNA replication and repair. Generally, in bacterial systems, chromosomal replication is carried out by a massive multi-protein replisome that provides high effectiveness and precision in the synthesis of the leading and lagging DNA strands.³ The helicase-primase core complex, and the clamp loader complex are the three catalytic centers that work together to accomplish this. The helicase- primase consists of the DNA-B helicase that uncoils the two DNA strands and the DNA-G primase.⁴ On the lagging strand, it synthesizes short RNA primers, which the replicative DNA polymerase, Pol IIIa, uses to start replication.⁵ Most of the constituent proteins in the replisome perform specialty tasks such as DNA unwinding, RNA primer synthesis, clamp loading, and DNA synthesis.⁶ MTB being an intracellular pathogen is exposed to several highly DNA-damaging attacks in vivo, mainly from antimicrobial reactive oxygen and nitrogen intermediates produced by the host (RNI).⁷ As a result, having DNA damage repair and reversing pathways that can effectively reverse the damaging impact of these problems is pertinent for bacterial survival. The study of DNA repair mechanisms in MTB has progressed more slowly than in other bacteria due to the technological challenges in dealing with such a slow-growing pathogen. Hence, rather than practical experiments, most conclusions regarding Mycobacterium tuberculosis DNA repair are still focused on insilico methodologies.⁸ Studying the biochemistry of this pathway can strengthen the knowledge about this disease and can help us consider the DNA replication and repair machinery as a source of new targets for anti-TB drug development.

With over 3924 open reading frames, MTB has the second-largest bacterial genome sequence.⁹ Moreover, the DNA replication, repair, recombination, and restriction/modification pathway is structurally, but not functionally reported in many studies. In this study, various computational tools have been used to generate biochemical, structural and functional information about all the proteins in this pathway and also to test the applicability of immunoinformatic analysis & homology modelling approach for MTB proteins involved in this pathway.

Methodology

Retrieval of the protein sequence

For the present study 69 amino acid sequences (Table 1) of Mycobacterium tuberculosis involved in DNA replication, repair, recombination, and restriction/modification pathway were retrieved in FASTA format using Mycobrowser database (https://mycobrowser.epfl.ch/).¹⁰ Mycobrowser is an exhaustive genomic and proteomic information repository for pathogenic mycobacteria. It provides physically curated annotations and relevant tools to work with genomic and proteomic investigation of these organisms.

Rv ID	Gene name	Description	Score
Rv1317c	alkA	DNA-3-methyladenine glycosidase II	0.4047
Rv2836c	dinF	DNA-damage-inducible protein F	0.503
Rv1329c	dinG	probable ATP-dependent helicase	0.4408
Rv3056	dinP	DNA-damage-inducible protein	0.4442
Rv1537	dinX	probable DNA-damage-inducible protein	0.4677
Rv0001	dnaA	chromosomal replication initiator protein	0.3864
Rv0058	dnaB	DNA helicase (contains intein)	0.4691
Rv1547	dnaE1	DNA polymerase III, α subunit	0.4199
Rv3370c	dnaE2	DNA polymerase III α chain	0.4387
Rv2343c	dnaG	DNA primase	0.5114
Rv0002	dnaN	DNA polymerase III, β subunit	0.6
Rv3711c	dnaQ	DNA polymerase III e chain	0.4202
Rv3721c	dnaZX	DNA polymerase III, γ (dnaZ) and τ (dnaX)	0.5522
Rv2924c	fpg	formamidopyrimidine-DNA glycosylase	0.5759
Rv0006	gyrA	DNA gyrase subunit A	0.5057
Rv0005	gyrB	DNA gyrase subunit B	0.6662
Rv2092c	helY	probable helicase, Ski2 subfamily	0.4458
Rv2101	helZ	probable helicase, Snf2/Rad54 family	0.4457
Rv2756c	hsdM	type I restriction/modification system DNA methylase	0.4363
Rv2755c	hsdS	type I restriction/modification system specificity determinant	0.4672
Rv3296	lhr	ATP-dependent helicase	0.41
Rv3014c	ligA	DNA ligase	0.5559
Rv3062	ligB	DNA ligase	0.5447
Rv3731	ligC	probable DNA ligase	0.5334
Rv1020	mfd	transcription-repair coupling factor	0.4799
Rv2528c	mrr	restriction system protein	0.4972
Rv2985	mutT1	MutT homologue	0.5417
Rv1160	mutT2	MutT homologue	0.5312
Rv0413	mutT3	MutT homologue	0.4964
Rv3589	mutY	probable DNA glycosylase	0.4257
Rv3297	nei	probable endonuclease VIII	0.4816
Rv3674c	nth	probable endonuclease III	0.3086
Rv1316c	ogt	methylated-DNA-protein-cysteine methyltransferase	0.4402
Rv1629	polA	DNA polymerase I	0.5243
Rv1402	priA	putative primosomal protein n' (replication factor Y)	0.4462
Rv3585	radA	probable DNA repair RadA homologue	0.519
Rv2737c	recA	recombinase (contains intein)	0.5066
Rv0630c	recB	exodeoxyribonuclease V	0.5005
Rv0631c	recC	exodeoxyribonuclease V	0.4653
Rv0629c	recD	exodeoxyribonuclease V	0.4881
Rv0003	recF	DNA replication and SOS induction	0.5034
Rv2973c	recG	ATP-dependent DNA helicase	0.5281
Rv1696	recN	recombination and DNA repair	0.516
Rv3715c	recR	RecBC-Independent process of DNA repair	0.4549
Rv2736c	recX	regulatory protein for RecA	0.647
Rv2593c	ruvA	Holliday junction binding protein DNA helicase	0.6785
Rv2592c	ruvB	Holliday junction binding protein	0.4449
Rv2594c	ruvC	Holliday junction resolvase, endodeoxyribonuclease	0.5832
Rv0054	ssb	single strand binding protein	0.7372
Rv1210	tagA	DNA-3-methyladenine glycosidase I	0.4588
Rv3646c	topA	DNA topoisomerase	0.6152
Rv2976c	ung	uracil-DNA glycosylase	0.2138
Rv1638	uvrA	excinuclease ABC subunit A	0.5413
Rv1633	uvrB	excinuclease ABC subunit B	0.4409
Rv1420	uvrC	excinuclease ABC subunit C	0.4965
Rv0949	uvrD	DNA-dependent ATPase I and helicase II	0.3221
Rv3198c	uvrD2	putative UvrD	0.4615
Rv0427c	xthA	exodeoxyribonuclease III	0.5531
Rv0071	-	group II intron maturase	0.4251
Rv0861c	-	probable DNA helicase	0.3907
Rv0944	-	possible formamidopyrimidineDNA glycosylase	0.4262
Rv1688	-	probable 3-methylpurine DNA glycosylase	0.6476
Rv2090	-	partially similar to DNA polymerase I	0.4737
Rv2191	-	similar to both PolC and UvrC proteins	0.4231
Rv2464c	-	probable DNA glycosylase, endonuclease VIII	0.2796
Rv3201c	-	probable ATP-dependent DNA helicase	0.4516
Rv3202c	-	similar to UvrD proteins	0.4949
Rv3263	-	probable DNA methylase	0.4812
Rv3644c	-	similar in N-t	0.7400

Table 1 The following table depicts the genes involved in DNA replication, repair, recombination and restriction/modification pathway and prediction of Antigenicity Score using Vaxijen Server

Prediction of antigenicity of the protein sequences: For the prediction of antigenic properties of the proteins VaxiJen server was used (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html). VaxiJen server is based on auto cross-covariance (ACC) and auto transformation of protein sequences into invariable vectors of major amino acid properties which was used to evaluate the antigenicity of the protein sequence. The VaxiJen server’s algorithm is based on the method of alignment of sequence and to analyse the physicochemical properties of the protein to identify them as antigenic or non-antigenic.

Physicochemical characterization and solubility prediction: The Physicochemical properties of the proteins such as the number of amino acids, pI value, molecular weight, molecular formula, number of atoms, extinction coefficients, estimated half-life, instability index, total number of positively and negatively charged residues, aliphatic index, and grand average of hydropathicity (GRAVY) were analyzed using Expasy’s ProtParam server (https://web.expasy.org/protparam/).¹¹ The total length and solubility of the protein were predicted using the SOSUI server (https://harrier.nagahama-i-bio.ac.jp/sosui/). CYS_REC server (http://www.softberry.com/berry.phtml?topic=cys_rec) was used to analyze the presence of Cysteine residues in the proteins and their bonding patterns.¹²

Secondary structure prediction: For the prediction of Secondary structures based on the primary sequence of the protein SOPMA (Self-Optimized Prediction Method with Alignment) server (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html) was used. With the help of this server, information about the protein such as α-helix, 3₁₀ helix, π helix, β bridge, Extended strands, β-turn, bend region and random coil was obtained. When an unknown protein is entered into this server, it searches for all the proteins with similar properties and evolution available in the database.¹³

Tertiary structure prediction & homology modelling: Homology modelling of the proteins was performed using the Phyre2 tool.¹⁴ (http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index) This tool aids in generating automated 3D structures based on comparative methods. The protein sequences were submitted, and the most suitable 3D models were selected on the basis of the highest value of Ramachandran’s favored region. Ramachandran Plot analysis was done using Structural Assessment tool of SWISS-MODEL server (https://swissmodel.expasy.org/interactive). Ramachandran plot analysis provides information on the total number of amino acid residues found in the favourable, allowed, and disallowed regions.¹⁵ The accuracy and stereochemical quality of the models were analyzed using ERRAT, Verify3D, PROCHECK, and PROVE from PROCHECK’s server (https://servicesn.mbi.ucla.edu/PROCHECK/). The predicted LG score, MaxSub score and Z-score of the proteins were anticipated using ProQ and ProsA server respectively. ProSA (Protein Structure Analysis) is a popular web tool that is used to verify 3D models of various proteins for possible errors. The output of this tool is an overall quality score in the form of a plot that describes the scores of all experimentally derived protein chains present in the protein data bank.¹⁶ ProQ (Protein Quality Predictor) is a tool that is used to predict the quality of a protein model based on neural networks which utilises several structural features such as atom-atom contacts, residue-residue contacts, and solvent-accessible surface area. It differs from other model predictors in a way in which it is optimized to find correct models instead of native structures. MaxScore and LGscore are the two outcome quality measures of this tool where MaxScore has a range between 0 and 1 (insignificant and significant respectively) and LGscore is the negative log P value which indicates structural similarity.

B epitope prediction and scanning protein for IFN epitopes: To anticipate linear B-cell epitopes based on antigen protein sequence features, we used a range of methods, including amino acid scales and HMMs (http://tools.iedb.org/bcell/result/). With a default threshold value of 0.350, the IEDB’s Bepipred method was used to predict linear B-cell epitopes from the conserved region of the proteins.^17–19 The Immune Epitope Database (IEDB Emini's) surface accessibility prediction tool was used to predict surface epitopes from the conserved region using the default threshold value of 1.0.²⁰ The antigenic sites were detected using the Kolaskar and Tongaonkar antigenicity approach, with a default threshold value of 1.025.²¹ BepiPred-2.0 web server was used for predicting B-cell epitopes from antigen sequences.²² Parker Hydrophilicity Prediction²³ was used to determine which regions of a protein were on the surface and hence predict the antigenicity of the protein. Chou and Fasman Beta turn prediction was performed for the protein in order to obtain beta turn areas in the query protein, as beta turns play an important role in antigenicity production.²⁴ IFNepitope (https://webs.iiitd.edu.in/raghava/ifnepitope/scan.php) was used for predicting Interferon (INF) gamma inducing regions in the protein. The IFN gamma epitope server predicts IFN epitopes based on a machine learning algorithm called Support Vector Machine by building overlapping sequences. For in silico immunization development, the epitopes with positive outcomes for the IFN-γ reaction were chosen.²⁵

Molecular docking analysis

The idea behind docking is to elicit an effective immune response between an antigen and an antigen receptor. H dock is a web server used for protein protein docking. In this study, both IFN gamma epitopes and B cell epitopes of Rv0054 & Rv3644c were docked with IP-10 protein (Crystal structure of mouse) using H-dock tool (http://hdock.phys.hust.edu.cn/). The workflow of the HDOCK web server is divided into four stages: (1) data input where input of either PDB structure or sequence in FASTA format is accepted (2) sequence similarity search is conducted against the PDB database for the receptor and ligand molecules and for protein input, the HH suite package is used (3) structure modeling is done using MODELLER in which the selected templates undergo sequence alignment using ClustalW. and (4) FFT-based global docking in which priority is given to user-input structures- Based on HDOCKlite, in which an improved shape wise scoring function is used based on putative binding modes by Yan. IP-10 (interferon-gamma-inducible protein) is a chemokine that belongs to the Colorectal cancer (CXC) family and is involved in the pathophysiology of a variety of immunological and inflammatory responses. It also has antifibrotic properties and is a powerful angiostatic factor. IP-10's biological effects are mediated via interactions with the G-protein-coupled receptor CXCR3, which is found on Th1 lymphocytes. IP-10 is thus a potential candidate for anti-inflammatory molecule structure-based rational drug design by Jabeen.

Results and discussion

MTB has long been characterized by a greater mortality rate, and it is predicted to account for 1.4 million tuberculosis fatalities today, second only to the Human Immunodeficiency Virus (HIV) among infectious diseases. Computational studies have provided multiple key molecular-level insights into the repair of damaged DNA by monofunctional DNA glycosylases, the first enzymes to function in the base excision repair pathway that targets non bulky nucleobase modifications.²⁷ Despite of the advancements in medical technologies and extensive research, MTB remains a serious public health concern as currently there are no effective drugs or vaccines established for its treatment. The in-silico technique is the initial step in developing a vaccine, and it is critical that the target protein be chosen correctly. To improve our understanding of host-microbe interactions and obtain a better understanding of their mechanisms, we used in-silico analysis to create a potential new vaccine candidate. Immunoinformatics is a relatively new field that has the potential to speed up immunology research. Computational models are already playing an important role not just in steering the selection of key experiments, but also in the creation of novel testable hypotheses through extensive analysis of complicated immunology data that could not be accomplished using traditional methods alone. The assessment of the various proteins involved in the selected pathway was implemented using this approach which might lead to the discovery of prospective therapeutic targets, vaccine candidates as well as diagnostic indicators.

Prediction of antigenicity of the protein sequences

The sequences were studied for their antigenicity by VaxiJen server keeping the default threshold, and it was found that out of all the sequences, 62 amino acid sequences were antigenic, and they were marked on the basis of the antigenicity score (Table 1). The study revealed several MTB proteins which are potential candidates for the investigation of both cellular and humoral immune responses in an infected host. The antigenicity scores ranged from the highest being 0.7400 to the lowest being 0.2138.

Physicochemical characterization and Solubility prediction

The physicochemical characteristics of all the proteins involved in DNA replication, repair, recombination, and restriction/modification pathway were computed using Expasy’s ProtParam and the resulting values for a number of amino acid residues, theoretical isoelectric point pI, instability index, aliphatic index, and Grand Average Hydropathicity (GRAVY) for each protein were evaluated (Table 2). The theoretical pI values for these proteins range from 4.51 to 10.66. The theoretical isoelectric point can be described as the pH at which a particular molecule carries no net electrical charge, however, protein carries a net negative charge above the isoelectric point and the protein carries a net positive charge, below isoelectrical point. For the pI values, we found that 60% of protein sequences were acidic (pI<7) and 40% of them were basic in nature (pI>7). The acidic nature is helpful as the pathogen is able to tolerate and survive the acidity of phagolysosomes during chronic infection inside the host.²⁸ This data can also be very advantageous for the advancement of the cushion framework for the refinement of protein by an isoelectric centering strategy.¹³ Further, an instability index of >40 indicates a stable protein and <40 indicates unstable protein.^29,30 In our study, 38 sequences (Table 2) had an instability index of less than 40 implying that those protein sequences were stable. The values of the instability index for the proteins range from 20.7 to 93.23. The aliphatic index of a protein is the relative volume occupied by the aliphatic side chains like Alanine (Ala), Valine (Val), Isoleucine (Ile), and Leucine (Leu). The proteins with high aliphatic index values are more thermally stable and in the case of globular proteins, it could be regarded as a positive factor for the increase of thermostability. The Aliphatic index of the protein’s ranges from 64.33 to 133.69 (Table 2), which shows steadiness across a wide range of temperatures. It might very well be viewed as a positive factor for the expansion of the thermostability of globular proteins.^31,32 The values of the extinction coefficient specify the amount of light a particular protein can absorb at a certain wavelength and the values of the molar extinction coefficient can be procured if the amino acid composition of the protein is known by Conn. The extinction coefficient values obtained for proteins from ProtParam are of two different types one assuming all pairs of Cys residues form cystines and the other assuming all Cys residues are reduced. The two types of extinction coefficient values for these proteins are almost in the similar range and in some proteins both the values are same; the values range from 4470 to 57885. The grand average hydropathy (GRAVY) value for a protein is calculated as the sum of hydropathy values of all amino acids, divided by the number of residues in the sequence. The soluble nature of the protein helps in DNA packaging in the cell which forms the protein moiety of nucleoprotein. GRAVY values for these proteins lie within -0.474 to 1.023 (Table 2). From all the selected sequences, 63 protein sequences had the grand average hydropathicity (GRAVY) score less than 0 implying that these protein sequences were hydrophilic in nature and soluble in water and could be a good choice for drug designing as it has lower value (Table 2).^33,34 This information could clarify whether the protein is globular (hydrophilic) or membranous (hydrophobic) and might give insights into the localization of proteins.

Rv ID	Molecular weight	Theoretical pI	Instability index	Aliphatic index	GRAVY
Rv1317c	53710.49	9.56	36.59	89.23	-0.097
Rv2836c	44737.86	10.41	22.31	133.69	1.023
Rv1329c	70135.47	6.1	34.9	95.5	0.038
Rv3056	37562.17	8.31	43.92	96.18	-0.037
Rv1537	49075.62	5.84	38.21	94.43	-0.062
Rv0001	56548.64	5.45	41.82	86.67	-0.381
Rv0058	96916.74	8.71	42.57	89.11	-0.308
Rv1547	129322.96	5.5	32.68	89.7	-0.199
Rv3370c	116483.67	7.09	39	87.83	-0.122
Rv2343c	69562.95	6.38	40.78	82.28	-0.213
Rv0002	42113.09	4.76	33.1	102.89	0.16
Rv3711c	35749.93	5.71	39.06	101.06	0.035
Rv3721c	61891.31	5.61	44.12	94.12	-0.119
Rv2924c	31950.7	9.98	41.1	90.42	-0.243
Rv0006	92274.31	5.41	38.09	95.68	-0.303
Rv0005	78439.74	6.18	31.31	85.36	-0.379
Rv2092c	99573.95	6.99	46.88	91.13	-0.26
Rv2101	111630.45	5.61	47.79	95.33	-0.2
Rv2756c	60084.11	5.31	41.39	78.44	-0.415
Rv2755c	39211.94	9.61	45.25	95.44	-0.034
Rv3296	161347.7	6.21	41.68	97.51	-0.011
Rv3014c	75257.1	5.42	34.66	91.94	-0.235
Rv3062	53704.57	9.18	30.8	100	0.07
Rv3731	40159.7	6.57	42.6	80.95	-0.337
Rv1020	132908.42	5.55	36.68	96.47	-0.101
Rv2528c	33648.17	5.53	30.75	91.57	-0.305
Rv2985	34748.34	9.28	42.31	82.46	-0.427
Rv1160	15160.28	5.79	21.3	105.25	-0.028
Rv0413	23481.16	5.04	48.69	80.97	-0.353
Rv3589	33684.45	8.85	46.08	86.68	-0.227
Rv3297	28525.6	9.09	33.23	93.69	-0.224
Rv3674c	26998.28	9.83	46.35	96	-0.06
Rv1316c	17858.23	5.92	25.77	91.15	-0.17
Rv1629	98439.98	5.01	33.73	93.23	-0.22
Rv1402	69839.07	9.8	49.68	99.15	-0.006
Rv3585	49881.01	6.74	37.27	100.6	0.132
Rv2737c	85389.06	6.01	28.47	91.89	-0.185
Rv0630c	118722.38	5.98	39.82	93.49	-0.166
Rv0631c	119501.23	6.15	39.71	96.34	-0.17
Rv0629c	61714.72	6.73	32.18	105.58	0.008
Rv0003	42180.2	6.75	40.51	108.7	-0.047
Rv2973c	80328.99	6.16	33.15	97.73	-0.114
Rv1696	62196.88	5.12	27.03	97.39	-0.132
Rv3715c	22119.35	4.99	20.7	105.62	-0.058
Rv2736c	19145.81	9.49	52.22	94.31	-0.399
Rv2593c	20189.23	6.42	26.54	109.59	0.308
Rv2592c	36626.94	5.35	38.65	100.41	0.048
Rv2594c	19753.76	9.22	22.43	99.31	0.122
Rv0054	17321.05	5.12	42.3	64.33	-0.474
Rv1210	22973.04	7.89	53.01	74.22	-0.449
Rv3646c	102317.51	8.23	37.66	82.91	-0.472
Rv2976c	24449.05	9.18	42.03	92.07	0.019
Rv1638	106099.45	6.45	33.9	91.67	-0.247
Rv1633	78038.32	5.05	44.26	93.48	-0.33
Rv1420	71582.11	8.5	39.92	86.53	-0.339
Rv0949	85049.88	5.36	41.51	92.62	-0.273
Rv3198c	75603.69	6.5	35.79	97.41	-0.104
Rv0427c	32108.26	5.15	39.67	80.82	-0.279
Rv0071	26891.83	9.55	40.8	92.04	-0.375
Rv0861c	59772.18	5.73	37.1	98.67	-0.14
Rv0944	16462.95	9.27	29.91	88.99	-0.047
Rv1688	21340.08	9.88	36.44	80.44	-0.231
Rv2090	41938.99	5.58	44.81	89.19	-0.204
Rv2191	69148.12	9.62	49.07	88.79	-0.088
Rv2464c	29681.92	9.88	41.72	82.54	-0.327
Rv3201c	116688.87	5.86	37.89	94.41	-0.02
Rv3202c	110729.42	8.72	44.14	96.31	-0.036
Rv3263	60673.5	8.35	35.3	92.31	-0.103
Rv3644c	41784.51	8.11	42.03	90.02	-0.057

Table 2 Physiochemical characteristics of all the proteins obtained from ExPasy’s ProtParam

Functional analysis of the proteins includes prediction of disulphide bonds and transmembrane region. After distinguishing between membrane and soluble proteins from amino acid sequences, the SOSUI server was used to predict the transmembrane helices for soluble proteins. Though there were cysteine residues present in some of the protein sequences no evidence was found for the presence of disulphide bonds. However, the gene Rv2836c (dinF) shows presence of 12 transmembrane regions among all selected proteins which is an important factor to be considered for the efficacy of drug and, disulphide bridges play an important role in determining thermostability of the protein molecule.³⁵

Secondary structure prediction

On analyzing the proteins using the SOPMA tool (Table 3), the presence of alpha helix is obtained to be dominant in the structures, followed by random coil, extended strand, and beta turns (Mukesh, Prathap, and Sabitha 2013). The default parameters with window width set at 17; similarity threshold set at 8 and division factor set as 4 were considered for the secondary structure prediction.⁹ In an alpha helix chain, the hydrogen bond forms between the hydrogen atom in the polypeptide backbone amino group of another amino acid that is four amino acids farther along the chain and the oxygen atom in the polypeptide backbone carbonyl group in one amino acid which holds the stretch of amino acids in a right-handed coil. In an alpha helix every helical turn shows presence of 3.6 amino acid residues. The side chains or R groups of the polypeptide protrude out from the α-helix chain which are not involved in the H bonds which help to maintain the alpha helix structure. The models for properties of individual residues and short segments of a polypeptide chain in a random coil contributes a framework for interpreting experimental NMR data for non-native protein conformations.³⁶ Proteins typically have compact, globular shapes assembled by combination of beta sheets. However, they require reversals in the direction of their chain to obtain these compact shapes. The reverse turn is also known as the hairpin bend or beta turn that provides a common structure which satisfies the requirement of chain reversal. Another type of structure responsible for chain reversals which are more complicated than reverse turns are loops. Although they do not show the presence of any periodic structures like beta sheets and alpha helices, they are well defined most of the time and are rigid.³⁷

Gene name	Alpha helix	Extended strand	Beta turn	Random coil
Rv1317c	49.40%	7.26%	5.85%	37.50%
Rv2836c	61.50%	12.30%	5.92%	20.27%
Rv1329c	48.64%	12.95%	4.82%	33.58%
Rv3056	40.46%	17.34%	5.49%	36.71%
Rv1537	39.74%	11.66%	5.18%	43.41%
Rv0001	49.31%	10.65%	3.94%	36.09%
Rv0058	46.57%	13.50%	6.18%	33.75%
Rv1547	47.21%	14.36%	6.76%	31.67%
Rv3370c	45.51%	11.68%	6.86%	35.96%
Rv2343c	51.33%	9.23%	7.67%	31.77%
Rv0002	28.86%	23.88%	5.72%	41.54%
Rv3711c	38.30%	12.46%	7.90%	41.34%
Rv3721c	48.44%	7.96%	5.71%	37.89%
Rv2924c	32.87%	17.99%	6.57%	42.56%
Rv0006	36.87%	21.24%	10.26%	31.62%
Rv0005	41.32%	16.81%	7.00%	34.87%
Rv2092c	52.43%	11.37%	6.95%	29.25%
Rv2101	46.40%	12.73%	4.74%	36.13%
Rv2756c	47.59%	9.26%	4.63%	38.52%
Rv2755c	35.99%	16.76%	5.77%	41.48%
Rv3296	44.42%	12.89%	6.94%	35.76%
Rv3014c	42.40%	13.17%	7.38%	37.05%
Rv3062	52.27%	14.00%	6.90%	26.82%
Rv2592c	50.29%	12.79%	5.52%	31.40%
Rv2594c	52.13%	18.09%	6.91%	22.87%
Rv0054	18.90%	18.29%	7.93%	54.88%
Rv1210	48.04%	3.92%	6.86%	41.18%
Rv3646c	43.25%	12.10%	6.21%	38.44%
Rv2976c	41.41%	15.42%	7.05%	36.12%
Rv1633	52.29%	14.90%	8.60%	24.21%
Rv1420	44.58%	15.48%	4.95%	34.98%
Rv0427c	30.58%	15.81%	9.28%	44.33%
Rv3198c	48.43%	11.43%	34.86%	0.00%
Rv0071	39.15%	14.89%	7.23%	38.72%
Rv0861c	43.17%	18.63%	5.17%	33.03%
Rv0949	48.38%	13.62%	5.97%	32.04%
Rv0944	51.27%	9.49%	3.80%	35.44%
Rv1688	20.20%	25.12%	6.90%	47.78%
Rv2090	33.59%	8.65%	5.34%	52.42%
Rv2191	43.88%	10.54%	3.88%	41.71%
Rv2464c	32.84%	17.91%	5.97%	43.28%
Rv3201c	47.14%	10.72%	3.00%	39.15%
Rv3202c	49.86%	8.06%	4.45%	37.63%
Rv3263	45.75%	16.64%	4.70%	32.91%
Rv3644c	63.34%	9.73%	3.49%	23.44%
Rv1638	36.73%	19.44%	8.54%	35.29%
Rv3731	31.01%	16.48%	6.42%	46.09%
Rv1020	43.03%	13.94%	6.00%	37.03%
Rv2528c	44.44%	12.75%	6.86%	35.95%
Rv2985	31.55%	15.14%	4.73%	48.58%
Rv1160	34.04%	17.02%	6.38%	42.55%
Rv0413	21.66%	21.20%	7.37%	49.77%
Rv3589	50.66%	4.93%	6.25%	38.16%
Rv3297	34.12%	18.04%	5.49%	42.35%
Rv3674c	49.39%	9.39%	6.12%	35.10%
Rv1316c	27.88%	22.42%	8.48%	41.21%
Rv1629	55.53%	9.85%	4.98%	29.65%
Rv1402	35.27%	15.73%	4.73%	44.27%
Rv3585	36.67%	15.42%	8.96%	38.96%
Rv2737c	37.22%	20.25%	8.99%	33.54%
Rv0630c	47.71%	10.69%	4.39%	37.20%
Rv0631c	45.49%	10.03%	3.19%	41.29%
Rv0629c	52.00%	13.22%	5.04%	29.74%
Rv0003	49.09%	16.88%	4.42%	29.61%
Rv2973c	46.68%	13.98%	6.11%	33.24%
Rv1696	58.94%	10.05%	6.13%	24.87%
Rv3715c	40.89%	14.78%	7.39%	36.95%
Rv2736c	71.84%	0.57%	4.02%	23.56%
Rv2593c	47.45%	14.29%	8.67%	29.59%

Table 3 The following table depicts the presence of alpha helix, extended strand, beta turn, and random coil present in the proteins using SOPMA tool

Tertiary structure prediction & homology modelling

Homology modelling of amino acid sequences involved in DNA replication, repair, recombination, and restriction/modification pathway of Mycobacterium tuberculosis was carried out to predict the 3D structure of these sequences. The models for this study were generated using the Phyre2 tool and the models were validated using Ramachandran’s plot analysis. The Ramachandran map is an efficient approach used to visualize the favored regions for backbone dihedral angles ѱ(Psi) against ϕ(Phi) of amino acid residues. The method involves plotting the ϕ(Phi) and the ѱ(Psi) scores on the X-axis and Y-axis respectively with angle spectrum ranging from − 180º to + 180º which predicts the secondary structure and possible conformation of the molecule (Abdullahi et al. 2021). In Ramachandran plot analysis, a good model is expected to have over 90 % of residues in the most favored regions which suggests a good quality of homology models. In this study, it was found that out of all the sequences, 7 protein structures had 100% of residues in the most favored regions and 58 protein structures had 90 % of the residues in the most favored regions (Table 4).

Rv ID	Ramachandran favoured	Ramachandran outliers	Rotamer outliers
Rv1317c	95.70%	3.23	0
Rv2836c	100%	0	0
Rv1329c	89.76%	3.25	0.63
Rv3056	97.97%	0	1.08
Rv1537	95.69%	1.72	1.13
Rv0001	100%	0	0
Rv0058	97.22%	0.46	0.53
Rv1547	93.21%	0.22	7.19
Rv3370c	93.41%	2.69	1.09
Rv2343c	97.37%	0	0
Rv0002	96.67%	1.67	1.01
Rv3711c	91.67%	5.36	0
Rv3721c	93.72%	1.26	3.61
Rv2924c	100%	0	0
Rv0006	98.48%	0	1.04
Rv0005	95.06%	1.23	0
Rv2092c	93.83%	2.3	5.56
Rv2101	93.67%	1.78	1.22
Rv2756c	92.38%	2.42	2.28
Rv2755c	100%	0	0
Rv3296	96.56%	0.73	2.7
Rv3014c	96.55%	0	6.45
Rv3062	94.06%	1.78	2.63
Rv3731	88.65%	5.95%	1.29%
Rv1020	95.05%	0.81%	2.15%
Rv2528c	98.57%	0.00%	0.00%
Rv2985	95.56%	1.27%	2.72%
Rv1160	97.66%	0.00%	0.00%
Rv0413	96.00%	2.40%	0.00%
Rv3589	94.55%	1.98%	4.71%
Rv3297	93.68%	1.19%	1.92%
Rv3674c	95.65%	0.48%	0.00%
Rv1316c	95.29%	2.35%	0.00%
Rv1629	95.92%	0.35%	0.00%
Rv1402	90.17%	6.46%	2.33%
Rv3585	96.62%	0.68%	1.83%
Rv2737c	99.28%	0.00%	0.87%
Rv0630c	92.91%	3.54%	2.55%
Rv0631c	90.41%	3.93%	3.24%
Rv0629c	92.74%	3.07%	0.72%
Rv0003	93.97%	3.45%	0.70%
Rv2973c	91.62%	5.20%	2.15%
Rv1696	96.45%	0.76%	0.34%
Rv3715c	100.00%	0.00%	11.76%
Rv2736c	100.00%	0.00%	4.35%
Rv2593c	95.88%	1.55%	2.80%
Rv2592c	96.78%	1.17%	0.38%
Rv2594c	96.43%	0.00%	0.00%
Rv0054	89.83%	5.08%	4.62%
Rv1210	94.57%	3.26%	0.00%
Rv3646c	93.92%	1.34%	1.34%
Rv2976c	99.11%	0.00%	0.00%
Rv1638	95.49%	0.87%	1.46%
Rv1633	96.07%	1.28%	3.38%
Rv1420	97.67%	0.00%	3.90%
Rv0949	97.50%	0.00%	0.00%
Rv3198c	98.55%	0.00%	0.00%
Rv0427c	93.78%	2.90%	1.00%
Rv0071	94.48%	3.07%	0.71%
Rv0861c	94.07%	0.89%	2.46%
Rv0944	95.00%	1.43%	0.00%
Rv1688	100.00%	0.00%	12.50%
Rv2090	97.77%	0.32%	0.80%
Rv2191	91.25%	4.08%	8.82%
Rv2464c	97.67%	0.00%	0.00%
Rv3201c	90.95%	1.89%	0.83%
Rv3202c	91.59%	1.62%	1.70%
Rv3263	87.48%	3.33%	1.93%
Rv3644c	96.00%	1.09%	0.00%

Table 4 The following table represents the values for Ramachandran plot obtained for the selected proteins

Verify3D analysis indicated that 53 protein structures had a score greater than 0 which conveys that the predicted models were valid. Around 8 protein structures had the ERRAT value more than 95%. ERRAT is a verification algorithm for protein structures that is used for evaluating the quality of crystallographic model building and refinement. Generally, a score above 95% is considered as a good high-resolution structure which indicates that these 8 protein structures were credible and acceptable.³⁸ The ProsA-web server was used to calculate the Z-score of the protein models to determine if the protein model predicted falls within the range of high-quality experimental structures.^16,39The requirement for ProSA-web server is only Cα atoms which helps in the evaluation of approximate models obtained in the structure determination process and low-resolution structures and can be compared against high-resolution structures. The variance of the total energy of the system from an energy distribution resulting from random conformations and shows overall model consistency which is indicated via z-score.^40,41 A z-score of -6.07 predicted by ProsA web server (Table 5) represents a good quality model (Prajapat, Bhattachar, and Kumar 2016). Taking that into account, in our study the two genes namely Rv3297 and Rv2593 had a score of -6.18 and -6.19 respectively which concludes that these models are of good quality. ProQ online server was used to forecast the quality of protein sequences used which depends on the neural system constructed apparatus that is based on the evaluation of the structural characters, there is the quality of a protein model, and it is efficient to discover local structures and to revise models. The quality estimates the LG score and MaxSub. The cutoff extent of LG score> 1.5 shows a very incredible model, > 2.5 extraordinary model and > 4 generally extraordinary model and there MaxSub score> 0.1 demonstrates amazingly extraordinary model, > 0.5 extraordinary model and > 0.8 incredibly incredible model. The study showed that all sequences had their LG scores and MaxSub scores as -0.835 and -0.113 (Table 5) respectively which indicates that the standard of all the protein structures is extremely good.^42–44 The quality of both global and local structures can be enhanced with this method.

Rv ID	ProQ		ProSA
	Predicted LGscore	Predicted MaxSub	Z-Score
Rv1317c	-0.835	-0.113	-4.57
Rv2836c	-0.835	-0.113	-3.94
Rv1329c	-0.835	-0.113	-4.32
Rv3056	-0.835	-0.113	-4.32
Rv1537	-0.835	-0.113	-4.32
Rv0001	-0.835	-0.113	-4.32
Rv0058	-0.835	-0.113	-4.32
Rv1547	-0.835	-0.113	-4.32
Rv3370c	-0.835	-0.113	-4.32
Rv2343c	-0.835	-0.113	-4.32
Rv0002	-0.835	-0.113	-4.32
Rv3711c	-0.835	-0.113	-4.32
Rv3721c	-0.835	-0.113	-4.32
Rv2924c	-0.835	-0.113	-4.32
Rv0006	-0.835	-0.113	-4.32
Rv0005	-0.835	-0.113	-4.32
Rv2092c	-0.835	-0.113	-4.32
Rv2101	-0.835	-0.113	-4.32
Rv2756c	-0.835	-0.113	-4.32
Rv2755c	-0.835	-0.113	-4.32
Rv3296	-0.835	-0.113	-4.32
Rv3014c	-0.835	-0.113	-4.32
Rv3062	-0.835	-0.113	-4.32
Rv3731	-0.835	-0.113	-5.04
Rv1020	-0.835	-0.113	-15.24
Rv2528c	-0.835	-0.113	-5.9
Rv2985	-0.835	-0.113	-8.34
Rv1160	-0.835	-0.113	-5.51
Rv0413	-0.835	-0.113	-3.83
Rv3589	-0.835	-0.113	-6.93
Rv3297	-0.835	-0.113	-6.18
Rv3674c	-0.835	-0.113	-7.47
Rv1316c	-0.835	-0.113	-4.52
Rv1629	-0.835	-0.113	-11.34
Rv1402	-0.835	-0.113	-5.03
Rv3585	-0.835	-0.113	-5.98
Rv2737c	-0.835	-0.113	-5.91
Rv0630c	-0.835	-0.113	-3.02
Rv0631c	-0.835	-0.113	-12.31
Rv0629c	-0.835	-0.113	-5.26
Rv0003	-0.835	-0.113	-5.76
Rv2973c	-0.835	-0.113	-6.6
Rv1696	-0.835	-0.113	-5.5
Rv3715c	-0.835	-0.113	-0.85
Rv2736c	-0.835	-0.113	-2.88
Rv2593c	-0.835	-0.113	-6.19
Rv2592c	-0.835	-0.113	-9.24
Rv2594c	-0.835	-0.113	0.5
Rv0054	-0.835	-0.113	-4.16
Rv1210	-0.835	-0.113	-3.79
Rv3646c	-0.835	-0.113	-12.83
Rv2976c	-0.835	-0.113	-8.57
Rv1638	-0.835	-0.113	-8.67
Rv1633	-0.835	-0.113	-12
Rv1420	-0.835	-0.113	-4.81
Rv0949	-0.835	-0.113	-2.2
Rv3198c	-0.835	-0.113	-6.63
Rv0427c	-0.835	-0.113	-4.98
Rv0071	-0.835	-0.113	-2.32
Rv0861c	-0.835	-0.113	-6.67
Rv0944	-0.835	-0.113	-5.15
Rv1688	-0.835	-0.113	-1.27
Rv2090	-0.835	-0.113	-10.81
Rv2191	-0.835	-0.113	-5
Rv2464c	-0.835	-0.113	-5.55
Rv3201c	-0.835	-0.113	-5.22
Rv3202c	-0.835	-0.113	-9.51
Rv3263	-0.835	-0.113	-5.97
Rv3644c	-0.835	-0.113	-4.57

Table 5 The following table represents the values and overall model quality extracted from webserver ProQ and ProSA

B-cell epitope prediction and scanning of proteins for IFN epitopes

B cell epitope-based prediction was performed for two genes with highest antigenic score Rv0054 & Rv3644c which could be valuable in planning and creating the epitope-based immunization against Mycobacterium tuberculosis. B-cells are an important part of the adaptive immune system because they can protect the body against pathogens and harmful molecules for a long time.²² B-cell epitope assessment is essential for a variety of medical, immunological, and biological applications, including disease control, diagnostics, and vaccine development by Shirai. Intracellular pathogen evasion and recruitment of cytotoxic lymphocytes and natural killer cells are processes in which interferon gamma plays a very significant role.²⁵ The DNA damage pathway includes the recruitment of certain repair enzymes, and the initiation of sign transducers that direct cell cycle and cell survival by Brzostek-Racine. As per the results of B cell prediction for genes with the highest antigen scores: Rv0054 & Rv3644c, IFN gamma inducing regions were predicted and were then proceeded further for molecular docking analysis.

Molecular docking analysis

Following the identification of epitope sequences of Rv0054 & Rv3644c genes, molecular docking was performed with IP-10 protein (Crystal structure of mouse) using H-dock. The docking scores were divided into different ranks of models_rank numbers. RMSD values below 2.0 Å are good docking scores. To this study, only rank 1 model were considered (Table 6). Since the output corresponds to all values below 2.0 Å, it is considered to share a good binding affinity by Ramírez.

Rv3644c		Rv0054
epitopes	Docking score	epitopes	Docking score
ALQCTSGGEPGCGRC	-145.17	AENVAESLTRGARVI	-137.7
CTSGGEPGCGRCRAC	-134.7	ENVAESLTRGARVIV	-128.84
TSGGEPGCGRCRACT	-98	NVAESLTRGARVIVS	-173.54
SGGEPGCGRCRACTT	-152.28	VAESLTRGARVIVSG	-123.27
GGEPGCGRCRACTTT	-157.13	AESLTRGARVIVSGR	-157.84
GEPGCGRCRACTTTL	-157.55	ESLTRGARVIVSGRL	-162.06
GRCRACTTTLAGTHA	-162.32	SLTRGARVIVSGRLK	-136.33
TTLAGTHADVRRVIP	-175.96	LTRGARVIVSGRLKQ	-159.27
VIPEGLSIGVDEMRA	-138.84	TRGARVIVSGRLKQR	-133.56
ANALLKVVEEPPPST	-155.65	RGARVIVSGRLKQRS	-141.54
NALLKVVEEPPPSTV	-157.65	GARVIVSGRLKQRSF	-142.39
ALLKVVEEPPPSTVF	-145.01	RVIVSGRLKQRSFET	-104.77
LLKVVEEPPPSTVFL	-149.37	VIVSGRLKQRSFETR	-137.58
LKVVEEPPPSTVFLL	-156.66	ETREGEKRTVIEVEV	-151.32
KVVEEPPPSTVFLLC	-175.5	EGEKRTVIEVEVDEI	-127.2
EEPPPSTVFLLCAPS	-138.64	VIEVEVDEIGPSLRY	-149.99
EPPPSTVFLLCAPSV	-187.81	VEVDEIGPSLRYATA	-171.98
PPPSTVFLLCAPSVD	-152.19	EVDEIGPSLRYATAK	-182
PSVDPEDIAVTLRSR	-135.32	VDEIGPSLRYATAKV	-162.87
SVDPEDIAVTLRSRC	-165.67	DEIGPSLRYATAKVN	-184.13
VDPEDIAVTLRSRCR	-137.47	EIGPSLRYATAKVNK	-161.33
DPEDIAVTLRSRCRH	-183.87	IGPSLRYATAKVNKA	-161.71
PEDIAVTLRSRCRHV	-170.52	GPSLRYATAKVNKAS	-118.74
EDIAVTLRSRCRHVA	-153.62	PSLRYATAKVNKASR	-143.1
DIAVTLRSRCRHVAL	-149.31	SLRYATAKVNKASRS	-155.24
IAVTLRSRCRHVALV	-180.32	LRYATAKVNKASRSG	-127.98
AVTLRSRCRHVALVT	-153.38	RYATAKVNKASRSGG	-165.53
VTLRSRCRHVALVTP	-170.93	TAKVNKASRSGGFGS	-100.59
TLRSRCRHVALVTPS	-168.09	GSGSRPAPAQTSSAS	-105.05
LRSRCRHVALVTPST	-137.93	SGSRPAPAQTSSASG	-144.57
RSRCRHVALVTPSTH	-113.98	GSRPAPAQTSSASGD	-108.57
SRCRHVALVTPSTHA	-161.97	SRPAPAQTSSASGDD	-120.45
RCRHVALVTPSTHAI	-161.29	SGGFGSGSRPAPAQT	-125.98
CRHVALVTPSTHAIA	-153.05	DDPWGSAPASGSFGG	-120.87
RHVALVTPSTHAIAQ	-134.31	DPWGSAPASGSFGGG	-167.58
LVTPSTHAIAQVLSD	-141.43	PWGSAPASGSFGGGD	-181.42
TANWAASVSGGHVGR	-129.85	WGSAPASGSFGGGDD	-95.74
EELRTALGAGGTGKG	-152.1
ELRTALGAGGTGKGT	-149.44
LRTALGAGGTGKGTG	-148.01
RTALGAGGTGKGTGA	-135.56
TALGAGGTGKGTGAA	-152.1
LGAGGTGKGTGAALR	-112.96
KGTGAALRGATGAMK	-152.21
IDLATYFRDALLVAA	-174.77
AAHAGGVRANHPDMA	-151.49
AHAPPERLLRCIEAV	-180.52
HAPPERLLRCIEAVL	-194.87
APPERLLRCIEAVLA	-158.55
PPERLLRCIEAVLAC	-146.37
EALAVNVKPKFAVDA	-146.73

Table 6 Molecular Docking analysis scores using Hdock

In this study, the selected best proteins have a good immune response to mice protein IP-10. The lead proteins show satisfactory physiochemical properties, antigenicity, secondary and tertiary structures, and molecular docking scores.

Therefore, these proteins can be considered effective against MTB. We believe the findings will benefit in the development of conventional medicine based therapeutic approaches as well as the advancement of better research for future treatment of MTB.^45–64

Conclusion

Tuberculosis is a life-threatening disease and a global health challenge. There is an urgent need for potent diagnostic marker against this deadly disease. For this study a total of 69 amino acid sequences involved in the DNA replication, repair, recombination and restriction/modification pathway of Mycobacterium tuberculosis was taken into consideration. Retrieval of the amino acid sequences was done using Tuberculist tool and Mycobrowser. VaxiJen server was used to study the antigenicity of the protein sequences. The physicochemical characterization was done using various computational tools and servers based on different parameters. The distinct parameters were isoelectronic point, molecular weight, instability index, aliphatic index, GRAVY and also the positive & negative residues. SOPMA was used for the analysis of Secondary structure prediction where the alpha helix, 3₁₀Helix, Pi helix, Beta bridge, Extended strand, Beta turn, Bend region, random coil, Ambiguous states and other states were predicted. ProtParam was used to study the amino acid composition. Three-dimensional structures were predicted using the Phyre tool. Ramchandran plot maps were analyzed using Swiss model server. ProsA and ProQ servers were used to study the Z-score and LGscore & MaxSub score respectively. This study concludes that the two proteins Rv0054 & Rv3644c can be considered to play a potential role as a diagnostic agent for Mycobacterium Tuberculosis. Computational analysis and homology modelling of Mycobacterium tuberculosis involved in DNA replication, repair, recombination, and restriction/modification pathway provides a basis for analysis of these proteins. This research is believed to set a course for positive outcomes and potential diagnostic markers using immunoinformatic based tools that will aid in the development of remedy against Mycobacterium tuberculosis.

Acknowledgments

None.

Conflicts of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.