Review on: quantitative structure activity relationship (QSAR) modeling

doi:10.15406/japlr.2018.07.00232

Journal of

eISSN: 2473-0831

Analytical & Pharmaceutical Research

Mini Review Volume 7 Issue 2

Review on: quantitative structure activity relationship (QSAR) modeling

Umma Muhammad,

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

Adamu Uzairu, David Ebuka Arthur

Correspondence: Umma Muhammad, Department of Pre-nd Sci & Tech. School of General Studies, Kano State Polytechnic, Nigeria

Received: January 08, 2018 | Published: April 27, 2018

Citation: Muhammad U, Uzairu A, Arthur DE. Review on: quantitative structure activity relationship (QSAR) modeling. J Anal Pharm Res. 2018;7(2):240?242. DOI: 10.15406/japlr.2018.07.00232

Download PDF

Abstract

Quantitative Structure Activity Relationship (QSAR) are mathematical models that seek to predict complicated physicochemical /biological properties of chemicals from their simpler experimental or calculated properties .QSAR enables the investigator to establishes a reliable quantitative relationship between structure and activity which will be used to derive an insilico model to predict the activity of novel molecules prior to their synthesis. The past few decades have witnessed much advances in the development of computational models for the prediction of a wide span of biological and chemical activities that are beneficial for screening promising compounds with robust properties. This review covers the concept, history of QSAR and also the components involved in the development of QSAR models.

Keywords: QSAR, model development, applicability domain, molecular descriptor, virtual screening

Introduction

Quantitative structure – activity relationship (QSAR) modeling pertains to the construction of predictive models of biological activities as a function of structural and molecular information of a compound library. The concept of QSAR has typically been used for drug discovery and development and has gained wide application for correlating molecular information with not only biological activities but also with other physicochemical properties, which has therefore been termed quantitative structure – property relationship (QSPR). QSAR is widely accepted predictive and diagnostic process used for finding associations between chemical structures and biological activity. QSAR has emerged and has evolved trying to fulfill the medicinal chemist^’s need and desire to predict biological response.¹ It found its way into the practice of agro chemistry, pharmaceutical chemistry, and eventually most facets of chemistry.²

QSAR is the final result of computational processes that start with a suitable description of molecular structure and ends with some inference, hypothesis, and predictions on the behavior of molecules in environmental, physicochemical and biological system under analysis.³ The final outputs of QSAR computations are set of mathematical equations relating chemical structure to biological activity.^4–6 Multivariate QSAR analysis employs all the molecular descriptors from various representations of a molecule (1D, 2D and 3D representation) to compute a model, in a search for the best descriptors valid for the property in analysis. This review covers the concepts, history and the steps involved in the development of QSAR models.

History of QSAR

Cros² proposed a relationship which existed between the toxicity of primary aliphatic alcohols with their water solubility.² In 1868 Crum-Brown and Fraser published an equation which is considerable to be the first generation formulation of a quantitative structure-activity relationship, in their investigations of different alkaloids.⁷ Systematic QSAR began with the work of⁸ on the narcotic activity of various drugs.⁹ Hammett¹⁰ introduced a method to account for substituent effects on reaction mechanism.¹⁰ Taking Hammetts model into account Taft proposed in 1956 an approach for separating polar, steric, and resonance effects of substituents in aliphatic compounds.¹¹ Classical approach to QSAR/QSPR was led by the pioneering works of Hansch et al.¹² in the development of linear Hansch equation.¹²

QSAR/QSPR received a big boost with the development of newer, more complex descriptors, soft ware’s and computers. This has been instrumental in the application of the prediction techniques that were either not feasible or were previously too time consuming.

QSAR methodology

QSAR methodologies have the potential of decreasing substantially the time and effort required for the discovery of new medicines.¹³ A major step in constructing the QSAR models is to find a set of molecular descriptors that represents variations of the structural properties of the molecule.¹⁴ The QSAR analysis employs statistical methods to derive quantitative mathematical relationship between chemical structure and biological activity.¹⁵ The process of QSAR modelling can be divided into three stages: development, model validation and application.

Development

For the development of the model the compounds gathered from literature source could be divided into training and test set. The training sets are used in model construction while the test set for external validation.

The structures of the complexes under study could be drawn in 2D ChemDraw. These could be converted into 3D objects using the default conversion procedure implemented in the CS Chem 3D ultra. The generated 3D structures of the complex were then subjected to energy minimization and geometry optimization using Spartan.¹⁶ Molecular descriptors could be calculated using chemical software’s such as Dragon,¹⁷ Gaussian,¹⁸ PADEL,¹⁹ etc. Molecular descriptors can be defined as the essential information of a molecule in terms of its physicochemical properties such as constitutional, electronic, geometrical, hydrophobic, lipophilicity, solubility, steric, quantum chemical and topological descriptors.²⁰ Multivariate analysis such as multi linear regression, Partial least Square etc could be carried out for correlating molecular descriptors with observed activity.

Internal model validation

The developed models were validated internally by leave- one- out (LOO) cross- validation technique. In this technique, one compound is eliminated from the data set at random in each cycle and the model is built using the rest of the compounds. The model thus formed is used for predicting the activity of the eliminated compound. The process is repeated until all the compounds are eliminated once. The Cross-validated squared correlation coefficient, R2cv (Q2) was calculated using the expression:

Q^{2} = 1 - \frac{\sum^{​} {(Y_{O b s} - Y_{P r e d})}^{2}}{\sum^{​} {(Y_{O b s} - \bar{Y})}^{2}}

Where Y_OBS represents the observed activity of the training set compounds, Y_pred is the predicted activity of the training set compounds and $\bar{Y}$ corresponds to the mean observed activity of the training set compounds. Also calculated was the adjusted R²(_adjR²) which is a modification of R² that adjust the number of explanatory terms in a model. Unlike R² in which addition of descriptors to the developed QSAR model increases its value, the value of _adjR² increases only if the new term improves the model more than what would be expected by chance.²¹ Hence _adjR² overcomes the draw backs associated with the value of R² and was calculated using the expression:

a d j R^{2} = \frac{(n - 1) R^{2} - p}{n - p - 1}

Where p is the number of predictor variables used in the model development. In other to judge the overall significance of the regression coefficients, the variance ratio, F value (the ratio of regression mean square to deviations mean square), was also calculated using the relation:

External model validation

External validation was employed in order to determine the predictive capacity of the developed model as judged by its application for the prediction of test set activity values and calculation of predictive R²(R²pred) value as given by the expression:

R_{p r e d}^{2} = 1 - \frac{\sum^{​} {(Y_{p r e d (T e s t)} - Y_{(T e s t)})}^{2}}{\sum^{​} {(Y_{(T e s t)} - {\bar{Y}}_{(T r a i n i n g)})}^{2}}

Where $Y_{p r e d (T e s t)}$ and $Y_{(T e s t)}$ indicate predicted and observed activity values respectively, of the test compounds. ${\bar{Y}}_{(T r a i n i n g)}$ indicates mean activity value of the training set. R²_pred is the predicted correlation coefficient calculated from the predicted activity of all the test set compounds. It has been observed that R² pred may not be sufficient to indicate the external predictability of a model since its value is controlled by $\sum^{} {(Y_{(T e s t)} - {\bar{Y}}_{(T r a i n i n g)})}^{2}$ . Thus R²_pred depends on the training set mean and may not truly reflect the predictive capability of the developed model with regards to a new data set.²² this may result in considerable numerical difference between the observed and predicted values in spite of maintaining a good overall intercorrelation.

F = \frac{(\frac{\sum^{​} {(Y_{c a l} - \bar{Y})}^{2}}{p})}{(\frac{\sum^{​} {(Y_{o b s} - Y_{c a l})}^{2}}{N - P - 1})}

Randomization test

The Robustness of the developed QSAR model was checked using Y-randomization technique in which model randomization was employed. In Y-randomization, validation was performed by permutating the response values, Activity (Y) with respect to the descriptor (X) matrix which was unaltered. The deviation in the values of the squared mean correlation coefficient of the randomized model (Rr²) from the squared correlation coefficient of the non-random model (R²) is reflected in the value of R²_p parameter computed from the expression.²³

R_{p}^{2} = R^{2} \times \sqrt{(R^{2} - R_{r}^{2})}

In an ideal case, it is observed that the average value of R2 (Rr2) for randomized models should be zero. This implies that the value of Rp2 should be equal to the value of R2 for the developed QSAR model. This led Todeschini²⁵ to suggest a correction for Rp2 which is defined as:

c R_{p}^{2} = R \times \sqrt{R^{2} - R_{r}^{2}}

In other to penalize the developed models for the difference between the squared correlation coefficients of the randomized and the non-randomized models, the values cR_p² was calculated for each model. This procedure ensures that the model is not due to a chance. The Y-randomization results were generated using the program “MLR Y-Randomization Test 1.2”.²⁴

Application

The application of QSAR models depends on statistical significance and predictive ability of the models. The prediction of a modeled response using QSAR is valid only if the compound being predicted is within the applicability domain of the model. The applicability domain is a theoretical region of the chemical space, defined by the model descriptors and modeled response and thus by the nature of the training set molecules.²⁵ It is possible to check whether a new chemical lies within applicability domain using the leverage approach. A compound will be considered outside the applicability domain when the leverage values is higher than the critical value of 3p/n, where p is the number of model variables plus 1 and n is the number of objects used to develop the model. Other approach includes training set interpolation by Jawors.²⁶ Cluster – based approach by Stan forth et al.²⁷

Conclusion

The QSAR models are useful for various purposes including the prediction of activities of untested chemicals. It helps in the rational design of drugs by computer aided tools via molecular modeling, simulation and virtual screening of promising candidates prior to synthesis. In this review article the concept, brief history and components involved in modeling were discussed.