Mini Review Volume 3 Issue 1
Independent researcher and analyst Bioinformatics, India
Correspondence: Anubha Dubey, Independent researcher and analyst Bioinformatics, Gayatri Nagar, Katni, 483501 MP, India
Received: May 01, 2017 | Published: February 1, 2018
Citation: Dubey A. Machine learning approaches in drug development of HIV/AIDS. Int J Mol Biol Open Access. 2018;3(1):23-25. DOI: 10.15406/ijmboa.2018.03.00044
Due to the complexity of HIV/AIDS cutting edge machine learning technologies are used for drug delivery and development. In this review drug delivery methods are discussed with machine learning techniques. Combination of both these computational methods will give new hope to enhance the life of HIV infected persons. As these methods are time consuming and easy to interpret than wet lab techniques.
Keywords: drug, machine learning, computational method, wet-lab
Globally, approximately 35million people are infected with Human Immunodeficiency Virus (HIV), the virus that causes acquired immunodeficiency syndrome (AIDS). The currently available medicines and vaccines in the development pipeline include:
A first class medicine intended to prevent HIV from breaking through the cell membrane.
A cell therapy that modifies a patient’s own cells in an attempt to make them resistant to HIV. A therapeutic vaccine designed to induce responses from T-cells that play a role in immune protection against viral infections. Biopharmaceutical research companies are investigating new ways to treat and prevent HIV infection. Potential therapies being developed for HIV infection include:
HIV infection consists of highly active antiretroviral therapy, or HAART3 or ART, these classes are consisting two nucleoside analogue reverse transcriptase inhibitors (NARTIs or NRTIs) plus either a protease inhibitor or a non-nucleoside reverse transcriptase inhibitor (NNRTI). Abacavir – nucleoside analog reverse transcriptase inhibitors (NARTIs or NRTIs) shows good results. According to one study the average life expectancy of an HIV infected individual is 32years from the time of infection if treatment is started when the CD4 count is 350/µL.4 If CD4counts is less than 500ART is also recommended to enhance the life expectancy of HIV infected individuals. Figure 1 describes how ARTS hindering HIV life cycle in humans and shows their responses. Anti-retroviral drugs are expensive, so there is a need to develop vaccines or drugs that enhance the life of HIV infected persons without side effects. In view of this, computational methods along with machine learning brings hope to develop such treatment which cost less and reachable to common persons.
Machine learning techniques are cutting edge technologies5 have been referred to the development of algorithms that improve their performance in pattern recognition, classification, regression and prediction based on the models derived from existing data. It is closely related to data mining as pattern recognition is one of the most important areas of research in both. Algorithms like classification have been frequently used to identify active and inactive compounds while regression approaches are applied to the training and testing the continuous data (prediction also used). Then ensemble algorithms i.e. bagging, boosting, etc make accurate and fast decisions. Although cross-validation also plays an important role in achieving the result. In drug discovery and development like target identification machine learning methods have been widely used in quantitative structure activity relationship, ligand based virtual screening, in-silico ADMET (Adsorption, Distribution, Metabolism, Excretion and Toxicity) studies. Modern QSAR are characterized by the use of multiple descriptors of chemical structures combined with both linear and non-linear optimization techniques and a strong emphasis is done on model validation. These methods include structural, physicochemical properties of compounds in which counts of atom, bond, electrostatic and thermodynamic properties are important. Modeller, chem. sketch, DRAGON, MOE, VMD, AUTODOCK etc are the computational based cheminformatics software that is widely used in target identification, hit discovery, etc. Machine learning methods are widely used to identify best suited model obtained by these methods. QSAR models are widely used in virtual screening for hit discovery.6 Hence it shows structure activity relationship to find the potential target for drug delivery and drug discovery. Virtual screening is major target area within the cheminformatics spectrum that has used machine learning techniques. Virtual screening (VS) is the application of computational tools to search large databases for new leads with higher probability of strong binding affinity to the target protein. VS methods can be classified into structure based (SBVS) and ligand based (LBVS) approaches depending on the amount of structural and bioactivity data available. If the 3D structure of the receptor is known, a SBVS method is used in high throughput docking,7 but if information of receptor is scant, LBVS methods are commonly recommended. Thus taken together there is a broad spectrum of applications for machine learning methods in computer aided drug discovery. That makes it attractive to select approaches and highlight their applications. HIV has a special character of recombination and sudden & rapid mutation so it is difficult for biopharmaceuticals to formulate medicines for curing AIDS. Still many medicines are in practise and shows better results. These methods need clinical trials but drug designing with combination of computational methods with molecular basis of HIV proves better results. Here are some of the machines learning approaches that are widely used with computational methods:
Support vector machines
It is developed by Vapnik and co-workers8,9are supervised machine learning algorithms for facilitating compound classification, binary classification (linearly separable). Once linearly separable two classes of compound are separated by hyper plane as shown in Figure 2. There are many hyper planes developed and SVM chooses the hyper plane that maximizes the margin between the two classes as it was assumed that larger the margin, lower would be the error of the classifier when dealing with unknown data. These hyper planes are called support hyper planes (dash lines as shown in figure 2) and the data points lie on these hyper planes are called support vectors (blue and red dots). In the case of non-separable classes, soft-margin hyper plane is applicable, which maximizes the margin while keeping the number of misclassified samples minimal. When high dimensionality feature space is considered SVM-kernels are used. There are four kernels which are basically used: linear, polynomial, sigmoid, and radial basis (RBF). The first three kernels are global and RBF is a local kernel. Extensive work has shown that RBF-based SVM outperforms best then other three kernels and hence used widely. Basically SVM are used for binary property or activity prediction i.e. to distinguish between drugs and non-drugs10,11 or between compounds that have or do not have specific activity11‒13synthetic accessibility is also an important criteria or aqueous solubility.14
Decision tree (DT)
A DT is commonly as a tree with the root at the top and the leaves at the bottom as shown in Figure 2. Starting from the root, tree splits from the single trunk into two or more branches. Each branch itself split into two or more branches. This process continuous until a leaf is reached which could not further split. The split of the branch is referred as an internal node of the tree (root and leaves are also called node). Each leaf node is assigned with a target property whereas a non leaf node is assigned molecular property. This method basically used in designing combinatorial libraries, predicting drug-likeliness, predicting specific biological activities. Classification of compounds into drugs and non-drugs,15 ADME-TOX properties16‒19and metabolic stability.20 These models are simple to understand, interpret and validate. A moderate data size is recommended to avoid over fitting.
Ensemble methods
Cross-validation is necessary before choosing any of the classification method. Methods like random forest, bagging, boosting proved better results.
Naive base classifier
It is basically based on Byes theorem, which gives a mathematical framework for describing the probability of an event that might have been result of two or more causes:21
Equation (1)
This equation describes the probability p for state a existing for a given state b. The importance of Bayesian theorem is that probabilities of occurring new things depends upon existing knowledge. This is frequently used in chemo informatics both generally for predicting biological rather than physicochemical properties, prediction of toxicity of compound, protein target, bio active classification for drug like molecules22,23(v) k-Nearest neighbours: it is one of the simplest algorithms. A molecule is classified by a majority vote of its neighbours with the molecule being assigned to the class most common among its nearest neighbours. The k-NN algorithm is sensitive to the local structure of the data. Therefore it is ideal for calculating properties with strong localities as in predicting protein function.
Artificial neural networks
This method was developed to model brain structure and functioning. As neurons have certain topology they connected to each other forming neural networks. In ANN it is called feed forward network which includes multiplayer perceptrons (MLP), radial basis function (RBF) networks and Kohonen’s self organizing maps (Kohonen’s SOM).24 It is mainly applied in compound classification, QSAR studies, primary VS of compounds, identification of potential drug target sites and localization of structural and functional features of proteins25‒27 pattern identification and others.
LVS techniques are widely used for hit identification. The methodological spectrums of these techniques are wide and time consuming, as they are simple to implement and interpret. Developing drug for HIV is difficult task as high mutation rate of virus, still biopharmaceutical companies regularly work on this area. Machine learning cutting edge technologies would provide sound effect in drug development as comparative analysis is done in seconds. Studies shows SVM prediction is good then others. These all are based on trials of algorithms according to the availability of data. In near future vaccines are also developed based on MLT. It is said that in near future more focus on the development of machine learning algorithms that reflect domain knowledge. Clearly much work needs to be done for drug delivery and development of HIV. It is truly said worlds depend on hope.
None.
Author declares that there is no conflict of interest.
©2018 Dubey. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.