Amino Acids in Data Encryption

doi:10.15406/japlr.2016.02.00030

Journal of

eISSN: 2473-0831

Analytical & Pharmaceutical Research

Research Article Volume 2 Issue 5

Amino Acids in Data Encryption

Yamuna M,

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

Elakkiya A

Correspondence: Yamuna M, School of Advanced Sciences, VIT University, Vellore, India

Received: June 01, 2016 | Published: June 28, 2016

Citation: Yamuna M, Elakkiya A (2016) Amino Acids in Data Encryption. J Anal Pharm Res 2(5): 00030. DOI: 10.15406/japlr.2016.02.00030

Download PDF

Abstract

The security to a system is essential nowadays with the growth of Information technology and with the emergence of new techniques; the number of threats that a user is supposed to deal with has grown exponentially. To achieve security, it is very necessary to encode the data before sending it through the various communication channels available to make it unreadable. In this paper we proposed a method of encrypting DNA sequence using amino acids.

Keywords: Decryption; Encryption; DNA sequence; Amino acid; Deoxyribonucleic acid; Thymine; Cytosine; Guanine

Introduction

DNA, deoxyribonucleic acid, is a molecule made out of nucleic acids that can be found in every cell in our body and forms the genetic information of each living organism. Consequently, DNA is often noted as the “blueprint of biological life”, as it gives instructions for an organism’s functioning and development. A single DNA molecule is double stranded and has sequences of four bases: adenine (A), thymine (T), cytosine (C), and guanine (G).

A DNA database is a collection of human DNA samples that is often derived from blood, tissue, or saliva.

DNA databases were first established in the 1980s and were initially in forensics to identify criminals and in the military to help recognize deceased military members based on their remains. Today, DNA plays an important role in military, offence and other medical research so safe transfer of DNA is important. In this paper, we have proposed a method of converting DNA sequence into binary string sequence to increase security.

Bazli et al.¹ proposed a DNA encryption scheme and the use of biological alphabets to manipulate information by employing the DNA sequence reaction, to autonomously make a copy of its threads as an extended encryption key.¹ Umalkar et al.² proposed a message cryptography formula supported DNA sequence using complementary rules deoxyribonucleic acid sequence.² Atito et al.³ proposed a novel algorithm to communicate data securely. The proposed technique is a composition of both encryption and data hiding using some properties of DNA sequences.³Yamuna et al.⁴ proposed a method of encrypting DNA sequence using pre – order tree traversal.⁴ We have used all the papers mentioned above as base papers for our proposed method.

Discussion

DNA sequence

DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule. It includes any method or technology that is used to determine the order of the four bases -adenine, guanine, cytosine, and thymine - in a strand of DNA.⁵ Figure 1 provides an example of DNA sequence.⁶

Figure 1 DNA Sequence.

Proposed encryption scheme

In this section we propose a method of encrypting a DNA sequence as a binary string using amino acid properties.

Construction of binary table

Amino acids play central roles both as building blocks of proteins and as intermediates in metabolism. The 20 amino acids that are found within proteins convey a vast array of chemical versatility. All amino acids found in proteins have the basic structure, differing only in the structure of the R-group or the side chain. For our conversion we are considering the distance of all the atoms such as carbon, oxygen, nitrogen, sulphur from the alpha – carbon. In this method we are converting each amino acid into binary string of length 12. The first two digits represent the chemical property with which the amino acid belongs i.e.,, for polar group the binary string is 00, for aromatic and non - polar 01 and 11 respectively. The next four digits represent the presence of carbon, oxygen, nitrogen, sulphur respectively. For example, if the amino acid contains carbon and nitrogen then the binary representation is 1010. The next 6 digits represent the distances of these four atoms from the alpha carbon.

For example, consider Methionine, which belongs to the polar group, so the first two digits are 00. There are also three carbons and one sulphur in the R – group of Methionine, so the next four digits are 1001. The carbon and sulphur are at the distances 1, 2, 4 and 3 respectively from the alpha – carbon. Total sum of these distances is 10; convert the number to the binary length of 6 i.e., 10 = 001010. Therefore the binary conversion of methionine is 001001001010.

Similarly, we are converting all the other amino acids into binary string using the above procedure. This binary string is listed in the Table 1.

S. No	Amino Acids	Carbon	Oxygen	Nitrogen	Sulphur	Index Sum	Binary Conversion
1	Glycine	0	0	0	0	0C	0
2	Alanine	1	0	0	0	1C	1000000001
3	Valine	5	0	0	0	5C	1000000101
4	Leucine	9	0	0	0	9C	1000001001
5	Methionine	7	0	0	3	7C + 3S	1001001010
6	Isoleucine	8	0	0	0	8C	1000001000
7	Phenylalanine	22	0	0	0	22C	11000010110
8	Tyrosine	22	6	0	0	22C +6O	11100011100
9	Tryptophan	33	0	4	0	33C + 4N	11010100101
10	Serine	1	2	0	0	1C + 2O	111100000011
11	Threonine	3	2	0	0	3C + 2O	111100000101
12	Cysteine	1	0	0	2	3C	111000000011
13	Proline	5	0	0	0	5C	111000000101
14	Asparagine	3	3	3	0	3C +3O + 3N	111110001001
15	Glutamine	6	4	4	0	6C + 4O+ 4N	111110001110
16	Lysine	10	0	5	0	10C + 5N	111010001111
17	Arginine	11	0	16	0	11C + 16N	111010011011
18	Histidine	10	0	7	0	10C + 7N	111010010001
19	Aspartate	3	6	0	0	3C + 6O	111100001001
20	Glutamate	6	8	0	0	6C + 8O	111100001110

Table 1 Binary conversion of amino acids

Encryption algorithm

Step 1 Let Z be the sequence to be encrypted.

Let Z = ATGACGATGACTGATCGATCGATGACGTAT.

Step 2 Split the DNA sequences into codons.

For our example, Z = ATG ACG ATG ACT GAT CGA TCG ATG ACG TAT.

Step 3 Convert the codons in the DNA sequence into its corresponding amino acids.

In our example Z = M T M T D R S M T Y

Step 4 Convert the amino acids into a binary string of length 12 using Table 1.

M = 001001001010, T = 111100000101, M = 001001001010, T = 111100000101, D = 111100001001, R =

111010011011, S = 111100000011, M = 001001001010, T = 111100000101 Y = 011100011100.

Step 5 Concatenating the binary string we generate a binary sequence k.

For our example the binary string generated is k = 00100100101011110000010100100100101011
11000001011111000010011110100110111111000000110010010010101111000001010111 0001 1100

Step 6 Send this k to the receiver.

Decryption algorithm

For decrypting the sequence, we reverse the procedure.

Suppose the received sequence is

001000001001111110001110001000001000111100000101111010011011000000000000.

Step 1 Split this sequence into segments of length 12.

001000001001 111110001110 001000001000 111100000101 111010011011 000000000000

Step 2 Convert this binary string into amino acids using Table 1.

Step 3 The corresponding codons for the above amino acid are CTG CAG ATC ACC AGG GGG. The sequence is decrypted as CTGCAGATCACCAGGGGG.

Conclusion

DNA is important not only because it makes everyone biologically different from one another, but also because it is the unique identifier that humans are born with, and cannot change. Unlike other personal items which can be used to identify individuals, DNA cannot be replaced or changed. Hospitals establish medical databases to make DNA samples available for research purposes and also private organizations establish research databases to study specific diseases and conditions. The proposed method is secure and it would be very difficult for any intruder to break the encrypted message and retrieve the actual message.