Research Article Volume 7 Issue 1
PhD Math’s & Computer Science, Bordeaux University, France
Correspondence: Jean Claude Perez, PhD Math’s & Computer Science, Bordeaux University, France, Tel 33 (0)5 40 00 27 88
Received: September 19, 2022 | Published: November 24, 2022
Citation: Perez JC. Peculiar evolution of the Monkeypox virus genomes. Int J Vaccines Vaccin. 2022;7(1):13–16. DOI: 10.15406/ijvv.2022.07.00114
We compare the evolution of 14 genomes of monkeypox viruses including that of May 2022 that is currently spreading among humans in numerous countries outside Africa. Our aim was to discover mutations and other viral evolutions (recombination) of the virus genome that may explain the sudden impact of this epidemic circulating at very low-level and alert on its potential pathogenic character. We have evidenced the presence of a succession of a large number of T bases between the DNA-dependent RNA polymerase subunit rpo132 and the cowpox A-type inclusion protein, progressively rising from the absence of a characteristically long pattern of T-bases found in succession (≤ 10) in the early genomes of 1971, up to the 19 T-base sequence in the Israel 2018 reference strain and the 30 T bases thereafter in the 2022 strains. We find a complementary match for this long sequence of T bases only in the simian hemorrhagic encephalitis virus, at the 3' end of the genome with a long succession of 28 A-bases after the stop codon. More strikingly, we find that the corresponding 10 phenyl-alanine aa chain is reported as matching uniquely (E≤0.001) a hypothetical protein element in Plasmodium falciparum, Yersinia pestis, Escherichia coli and Penicillium nordicum. We wonder whether this region of the monkeypox genome situated right upstream this long T-repeat may potentially code for a not yet identified polypeptide sequences with a functional role.
Keywords: Monkeypox virus, biomathematics, master code, evolution, genomics, proteomics
Monkeypox is a zoonotic disease caused by the monkeypox virus, an orthopoxvirus closely related to the variola virus, the causative agent of smallpox. The Monkeypox virus was first discovered in 1958 in monkeys, although these animals are not the source of the virus. Human cases were first described in 1970. There are 2 strains of monkeypox viruses: the West African and the Central African strains.
Several cases of monkeypox viruses have been identified in a number of geographically distinct countries. In May 2022 cases were reported in Australia, Austria, Belgium, Canada, Denmark, France, Germany, Greece, Israel, Italy, the Netherlands, Portugal, Spain, Sweden, Switzerland and the U.K (Figure 1).1,2
Figure 1 Monkeypox viruses tree (from https://virological.or g/t/first-german-genome- sequence-ofmonkeypox-virus-associated-to-multi-country-outbreak-in-may-2022/812).
Nextstrain reference tree https://nextstrain.org/monkeypox?s=03
Monkeypox is classified as a zoonotic disease where transmission of the virus is usually due to contact between animals and human. Genetically, monkeypox viruses cluster into two groups: the Congo basin clade and the west African clade.
Monkeypox virus Zaire-96-I-16
This particular outbreak has been identified as due to a virus from the West African clade that is often associated with a milder disease and, in this case, human-to-human spread is suspected. The first human to human strain referenced was identified in Israel in 2018:in a man who returned from Nigeria to Israel in 2018 Erez.3
Monkeypox strains analyzed:
Gabon 1988 alias 2015 KJ642619.1
https://www.ncbi.nlm.nih.gov/nuccore/KJ642619.1
Cameroun 1990 alias 2015 KJ642618.1
https://www.ncbi.nlm.nih.gov/nuccore/KJ642618.1
Liberia 1970 DQ011156.1
https://www.ncbi.nlm.nih.gov/nuccore/DQ011156.1
Nigeria 1971) alias 2015 KJ642617.1
https://www.ncbi.nlm.nih.gov/nuccore/KJ642617.1
2018 Israel MN648051.1
https://www.ncbi.nlm.nih.gov/nuccore/MN648051.1
Zaire 2009 alias 2020 NC_003310.1
https://www.ncbi.nlm.nih.gov/nuccore/NC_003310.1
Rivers state 2020 MT903340.1
https://www.ncbi.nlm.nih.gov/nuccore/MT903340.1
UK 2020 MT903344.1
https://www.ncbi.nlm.nih.gov/nuccore/MT903344
USA 2022 ON563414.1
https://www.ncbi.nlm.nih.gov/nuccore/ON563414.1?report=GenBank&s=03
German 2022 ON568298.1
https://www.ncbi.nlm.nih.gov/nuccore/ON568298
Singapore 2020 MT903342.1
https://www.ncbi.nlm.nih.gov/nuccore/MT903342.1?report=genbank
Nigeria 2018 MG693723.1
https://www.ncbi.nlm.nih.gov/nucleotide/MG693723.1?report=genbank&log$=nuclalign&blast_rank=1&RID=98T6WWFV016
UK 2020 MT903345.1
https://www.ncbi.nlm.nih.gov/nucleotide/MT903345.1?report=genbank&log$=nuclalign&blast_rank=1&RID=98T3F4E013>
France 2022 ON602722.1
https://www.ncbi.nlm.nih.gov/nuccore/ON602722.1?report=genbank
Biomathematics methods – the, Master Code analysis
The "Master Code" method Perez,4 and Perez §Montagnier,5 is a META-CODE based on the atomic masses common only to DNA, RNA and amino acids to highlight a It allows us to unify the 3 codes of DNA, RNA and amino acid sequences.
Specifically, our Master Code coupling curves is a measurement of measures the level of correlation unifying any pair of genomic sequences (DNA double strand) and its proteomics (amino acids) translated sequence, whether or not it may for a protein.
In a previous article Perez,6 we have analyzed all types of prions in the early 2000s mad cow disease (present in plants, yeast, humans, cows, sheep, etc.). We then had then highlighted a possible "signature", a sort of invariant characteristic common to all prions. The typical signature of the Master Code unifying correlation take the shape of a "W" (or an “M” symmetrically). We had extended this type of analysis to amyloids implicated in the Alzheimer disease, Perez,7
Table 1 the last 3 cases analyzed date from May 2022. It is of note that the 2022 French genome is limited to a succession of 19 T bases. But in fact this sequence may also contain C bases substituted for T as both ttt and ttc codons are translated in phenyl-alanine residues. In that respect the length of the French sequence is actually equivalent to 21T. Sequencing errors are possible but not to the extent it would cover a range of 8 nucleotides. So the difference observed in the French sequence raises some question as it is obviously not the same as the other strains in that respect. It is also the case for the Italian sequence (ON622721 https://www.ncbi.nlm.nih.gov/nuccore/ON622721.1/).
Name |
Genbank ID |
Start T location |
Number of T |
Gabon1988 (2015) |
KJ642619.1 |
|
0 |
Cameroun1990 (2015) |
KJ642618.1 |
|
0 |
Liberia1970 |
DQ011156.1 |
|
0 |
ZAire2009 |
NC_003310.1 |
|
0 |
Nigeria1971 (2015) |
KJ642617.1 |
133245 |
27 |
Israel2018 |
MN648051.1 |
133298 |
19 |
Rivers state 2020 |
MT903340.1 |
133081 |
25 |
UK2020A |
MT903344.1 |
133081 |
27 |
Singapore2020 |
MT903342.1 |
133093 |
28 |
Nigeria2018 |
MG693723.1 |
126745 |
29 |
UK2020B |
MT903345.1 |
133100 |
28 |
France2022 |
ON602722.1 |
132972 |
19 |
USA2022 |
ON563414.1 |
133094 |
30 |
Germany2022 |
ON568298.1 |
133201 |
30 |
Table 1 Evolution of the T-bases contiguous region for the 14 genomes analyzed
This is by chance that we have discovered the presence of a 30-T long sequence in the middle of the USA2022 monkeypox genome, between the DNA-dependent RNA polymerase subunit rpo132 and the cowpox A-type inclusion protein, before a gene complement region that may become coding under circumstances that need to be specified by experts in the field.
For instance, if we look at the monkeypox strain Gabon-1988 we can identify in this region a sequence of nucleotides coding straightforwardly for a 42-aa long polypeptide chain that may constitute a small protein (Figure 2a, 2b).
Figure 2a Genome sequence extract of the monkeypox strain Gabon-1988, potentially coding for a small protein after the DNA-dependent RNA polymerase subunit rpo132 and before the gene complement.
Number of codons : 42
MGYLRSFYKRFHVPDHVQPSYVSPSLYRVYQSSLSEGDRTP
Figure 2b Genome sequence extract of the monkeypox strain USA2022, potentially coding for a small protein after the DNA-dependent RNA polymerase subunit rpo132 and before the gene complement.
Number of codons: 42
MGYLRSFYKRFHVPDHVQPSYVSPSLYRVYQSSLSEGDRTP.
This growing pattern of T-bases in succession follows a conserved nucleotide sequence that is conserved and may code for a small protein. The functional role of this pattern at the viral genome level is unknown to us.
While it long T-repeats repeat are common findings finding at the termination terminaison of a genome, as for instance at the end of the monkey encephalitis encephlitis virus, it is almost never encountered fully inside a whole genome sequence.
Simian hemorrhagic encephalitis virus isolate Sukhumi, complete genome
Sequence ID: NC_038293.1Length: 15370Number of Matches: 1
See 1 more title(s) See all Identical Proteins(IPG)
Range 1: 15336 to 15370GenBankGraphicsNext MatchPrevious Match
Alignment statistics for match #1
Score Expect Identities Gaps Strand
55.4 bits(60) 1e-04 33/35(94%) 0/35(0%) Plus/Minus
Query 133098 ttttttttttttttttttttttttttCGAATTCAC 133132
|||||||||||||||||||||||||| |||||||
Sbjct 15370 TTTTTTTTTTTTTTTTTTTTTTTTTTTTAATTCAC 15336
Why it is this peculiar nucleotide sequence located in this region of the genome?
Its presence at the end of what seems to be a potential protein may indicate a possible genome regulation role.
May it have another functional role ?
Also remarkable, although there is no evidence this nucleotide sequence is in a genome section that may be translated in aa, we find that a sequence of 30 T-bases codes for a polypeptide chain of 10 phenyl-alanine residues in succession, and that a BLAST search for this unorthodox protein sequence surprisingly retrieves a signal with an expectation value significantly beyond randomness (E≤0.001) for a match with an identical polypeptide reported as a hypothetical protein in Plasmodium falciparum, Yersinia pestis, Escherichia coli and Penicillium nordicum !
However, the question of the functional role remains open as we note Figure 3a this long T-base repeat is located at a peculiar position of the genome predicted to have a marked functional role according to the Master Code (44000 aa/ 132000 nt).
An analysis zooming on the small genome sections of 100 bases framing both sides of the 30-T sequence shows it is a new functionality Figure 3b as is the case for the 19-T sequence in Figure 4.
Figure 3a According to the Master Ccode analysis of the whole USA2022 Monkeypox genome. The region of 44000 amino acids where there is the 30 T- bases insert .appears to be highly functional.
The objective here was here to present how a new type of theoretical analysis helps identify a genome characteristic that would have otherwise remained unseen with the already established methods of mathematical genome analysis. Our findings may partly explain the sudden propagation of the monkeypox virus in the form we observed observe in quite a number of countries in May 2022. The role of the peculiar 30-T base long repeat sequence right in the middle of the virus genome is still to be determined experimentally. This work is an incentive for experimental investigations, for instance using a knockout genome (removing the T-repeat) among other possibilities.
None.
Authors declare that there is no conflict of interest.
©2022 Perez. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.