Machine learning approaches in drug development of HIV/AIDS

doi:10.15406/ijmboa.2018.03.00044

International Journal of

eISSN: 2573-2889

Molecular Biology: Open Access

Mini Review Volume 3 Issue 1

Machine learning approaches in drug development of HIV/AIDS

Anubha Dubey

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

Independent researcher and analyst Bioinformatics, India

Correspondence: Anubha Dubey, Independent researcher and analyst Bioinformatics, Gayatri Nagar, Katni, 483501 MP, India

Received: May 01, 2017 | Published: February 1, 2018

Citation: Dubey A. Machine learning approaches in drug development of HIV/AIDS. Int J Mol Biol Open Access. 2018;3(1):23-25. DOI: 10.15406/ijmboa.2018.03.00044

Download PDF

Abstract

Due to the complexity of HIV/AIDS cutting edge machine learning technologies are used for drug delivery and development. In this review drug delivery methods are discussed with machine learning techniques. Combination of both these computational methods will give new hope to enhance the life of HIV infected persons. As these methods are time consuming and easy to interpret than wet lab techniques.

Keywords: drug, machine learning, computational method, wet-lab

Introduction

Globally, approximately 35million people are infected with Human Immunodeficiency Virus (HIV), the virus that causes acquired immunodeficiency syndrome (AIDS). The currently available medicines and vaccines in the development pipeline include:

A first class medicine intended to prevent HIV from breaking through the cell membrane.

A cell therapy that modifies a patient’s own cells in an attempt to make them resistant to HIV. A therapeutic vaccine designed to induce responses from T-cells that play a role in immune protection against viral infections. Biopharmaceutical research companies are investigating new ways to treat and prevent HIV infection. Potential therapies being developed for HIV infection include:

Attachment inhibitor: the attachment inhibitor inhibits the attaching of virus to new cells. It means this inhibitor blocks the interaction between gp120 and the cell receptors.
Gene modification: CCR5 is a co-receptor on surface of cells that allows HIV to enter and infect T-cells. These cells from patients are extracted, modified and then reinserted into the patients. This therapy provides with the population of cells that can fight HIV and other opportunistic infections of HIV patients.
Involving T-cell responses: Another therapeutic vaccine in development is designed to include CD4+T cell responses in HIV infected people. CD4+T cells play a important role in immune protection against viral infections. Deficits in CD4+t cells are associated with virus reactivation, vulnerability to opportunistic infections and poor vaccine efficacy. For HIV/AIDS, the introduction of novel therapeutics and continuous research into their best use in patients have revealed the result of the development and introduction of multiple drugs (used in combinations i.e. anti retroviral drugs) are proven good in health of HIV infected patients. As there is currently no publicly available vaccine or cure for HIV or AIDS.¹ Some examples include a vaginal gel containing tenofovir, a reverse transcriptase inhibitor is developed showing good results in clinical trials.²

HIV infection consists of highly active antiretroviral therapy, or HAART³ or ART, these classes are consisting two nucleoside analogue reverse transcriptase inhibitors (NARTIs or NRTIs) plus either a protease inhibitor or a non-nucleoside reverse transcriptase inhibitor (NNRTI). Abacavir – nucleoside analog reverse transcriptase inhibitors (NARTIs or NRTIs) shows good results. According to one study the average life expectancy of an HIV infected individual is 32years from the time of infection if treatment is started when the CD4 count is 350/µL.⁴ If CD4counts is less than 500ART is also recommended to enhance the life expectancy of HIV infected individuals. Figure 1 describes how ARTS hindering HIV life cycle in humans and shows their responses. Anti-retroviral drugs are expensive, so there is a need to develop vaccines or drugs that enhance the life of HIV infected persons without side effects. In view of this, computational methods along with machine learning brings hope to develop such treatment which cost less and reachable to common persons.

Figure 1 Schematic representation of ARTS hindering HIV life cycle.

Methods and discussion

Machine learning techniques are cutting edge technologies⁵ have been referred to the development of algorithms that improve their performance in pattern recognition, classification, regression and prediction based on the models derived from existing data. It is closely related to data mining as pattern recognition is one of the most important areas of research in both. Algorithms like classification have been frequently used to identify active and inactive compounds while regression approaches are applied to the training and testing the continuous data (prediction also used). Then ensemble algorithms i.e. bagging, boosting, etc make accurate and fast decisions. Although cross-validation also plays an important role in achieving the result. In drug discovery and development like target identification machine learning methods have been widely used in quantitative structure activity relationship, ligand based virtual screening, in-silico ADMET (Adsorption, Distribution, Metabolism, Excretion and Toxicity) studies. Modern QSAR are characterized by the use of multiple descriptors of chemical structures combined with both linear and non-linear optimization techniques and a strong emphasis is done on model validation. These methods include structural, physicochemical properties of compounds in which counts of atom, bond, electrostatic and thermodynamic properties are important. Modeller, chem. sketch, DRAGON, MOE, VMD, AUTODOCK etc are the computational based cheminformatics software that is widely used in target identification, hit discovery, etc. Machine learning methods are widely used to identify best suited model obtained by these methods. QSAR models are widely used in virtual screening for hit discovery.⁶ Hence it shows structure activity relationship to find the potential target for drug delivery and drug discovery. Virtual screening is major target area within the cheminformatics spectrum that has used machine learning techniques. Virtual screening (VS) is the application of computational tools to search large databases for new leads with higher probability of strong binding affinity to the target protein. VS methods can be classified into structure based (SBVS) and ligand based (LBVS) approaches depending on the amount of structural and bioactivity data available. If the 3D structure of the receptor is known, a SBVS method is used in high throughput docking,⁷ but if information of receptor is scant, LBVS methods are commonly recommended. Thus taken together there is a broad spectrum of applications for machine learning methods in computer aided drug discovery. That makes it attractive to select approaches and highlight their applications. HIV has a special character of recombination and sudden & rapid mutation so it is difficult for biopharmaceuticals to formulate medicines for curing AIDS. Still many medicines are in practise and shows better results. These methods need clinical trials but drug designing with combination of computational methods with molecular basis of HIV proves better results. Here are some of the machines learning approaches that are widely used with computational methods:

Support vector machines

It is developed by Vapnik and co-workers^8,9are supervised machine learning algorithms for facilitating compound classification, binary classification (linearly separable). Once linearly separable two classes of compound are separated by hyper plane as shown in Figure 2. There are many hyper planes developed and SVM chooses the hyper plane that maximizes the margin between the two classes as it was assumed that larger the margin, lower would be the error of the classifier when dealing with unknown data. These hyper planes are called support hyper planes (dash lines as shown in figure 2) and the data points lie on these hyper planes are called support vectors (blue and red dots). In the case of non-separable classes, soft-margin hyper plane is applicable, which maximizes the margin while keeping the number of misclassified samples minimal. When high dimensionality feature space is considered SVM-kernels are used. There are four kernels which are basically used: linear, polynomial, sigmoid, and radial basis (RBF). The first three kernels are global and RBF is a local kernel. Extensive work has shown that RBF-based SVM outperforms best then other three kernels and hence used widely. Basically SVM are used for binary property or activity prediction i.e. to distinguish between drugs and non-drugs^10,11 or between compounds that have or do not have specific activity^11‒13synthetic accessibility is also an important criteria or aqueous solubility.¹⁴

Figure 2 Hyper plane separation of two objects.

Decision tree (DT)

A DT is commonly as a tree with the root at the top and the leaves at the bottom as shown in Figure 2. Starting from the root, tree splits from the single trunk into two or more branches. Each branch itself split into two or more branches. This process continuous until a leaf is reached which could not further split. The split of the branch is referred as an internal node of the tree (root and leaves are also called node). Each leaf node is assigned with a target property whereas a non leaf node is assigned molecular property. This method basically used in designing combinatorial libraries, predicting drug-likeliness, predicting specific biological activities. Classification of compounds into drugs and non-drugs,¹⁵ ADME-TOX properties^16‒19and metabolic stability.²⁰ These models are simple to understand, interpret and validate. A moderate data size is recommended to avoid over fitting.

Ensemble methods

Cross-validation is necessary before choosing any of the classification method. Methods like random forest, bagging, boosting proved better results.

Naive base classifier

It is basically based on Byes theorem, which gives a mathematical framework for describing the probability of an event that might have been result of two or more causes:²¹

$p (a / b) = \frac{p (b / a) p (a)}{p (b)}$ Equation (1)

This equation describes the probability p for state a existing for a given state b. The importance of Bayesian theorem is that probabilities of occurring new things depends upon existing knowledge. This is frequently used in chemo informatics both generally for predicting biological rather than physicochemical properties, prediction of toxicity of compound, protein target, bio active classification for drug like molecules^22,23(v) k-Nearest neighbours: it is one of the simplest algorithms. A molecule is classified by a majority vote of its neighbours with the molecule being assigned to the class most common among its nearest neighbours. The k-NN algorithm is sensitive to the local structure of the data. Therefore it is ideal for calculating properties with strong localities as in predicting protein function.

Artificial neural networks

This method was developed to model brain structure and functioning. As neurons have certain topology they connected to each other forming neural networks. In ANN it is called feed forward network which includes multiplayer perceptrons (MLP), radial basis function (RBF) networks and Kohonen’s self organizing maps (Kohonen’s SOM).²⁴ It is mainly applied in compound classification, QSAR studies, primary VS of compounds, identification of potential drug target sites and localization of structural and functional features of proteins^25‒27 pattern identification and others.

Concluding remarks & future directions

LVS techniques are widely used for hit identification. The methodological spectrums of these techniques are wide and time consuming, as they are simple to implement and interpret. Developing drug for HIV is difficult task as high mutation rate of virus, still biopharmaceutical companies regularly work on this area. Machine learning cutting edge technologies would provide sound effect in drug development as comparative analysis is done in seconds. Studies shows SVM prediction is good then others. These all are based on trials of algorithms according to the availability of data. In near future vaccines are also developed based on MLT. It is said that in near future more focus on the development of machine learning algorithms that reflect domain knowledge. Clearly much work needs to be done for drug delivery and development of HIV. It is truly said worlds depend on hope.