Genetic programming for QSAR investigation of docking energy

Research output: Contribution to journalArticlepeer-review

16 Citations (Scopus)


Statistical methods, and in particular Machine Learning, have been increasingly used in the drug development workflow to accelerate the discovery phase and to eliminate possible failures early during clinical developments. In the past, the authors of this paper have been working specifically on two problems: (i) prediction of drug induced toxicity and (ii) evaluation of the target-drug chemical interaction based on chemical descriptors. Among the numerous existing Machine Learning methods and their application to drug development ( see for instance [ F. Yoshida, J.G. Topliss, QSAR model for drug human oral bioavailability, Journal of Medicinal Chemistry 43 (2000) 2575-2585; Frohlich, J. Wegner, F. Sieker, A. Zell, Kernel functions for attributed molecular graphs-a new similarity based approach to ADME prediction in classification and regression, QSAR and Combinatorial Science, 38( 4) ( 2003) 427431; C. W. Andrews, L. Bennett, L. X. Yu, Predicting human oral bioavailability of a compound: development of a novel quantitative structure-bioavailability relationship, Pharmacological Research 17 ( 2000) 639-644; J Feng, L. Lurati, H. Ouyang, T. Robinson, Y. Wang, S. Yuan, S. S. Young, Predictive toxicology: benchmarking molecular descriptors and statistical methods, Journal of Chemical Information Computer Science 43 ( 2003) 1463-1470; T. M. Martin, D. M. Young, Prediction of the acute toxicity (96-h LC50) of organic compounds to the fat head minnow (Pimephales promelas) using a group contribution method, Chemical Research in Toxicology 14( 10) ( 2001) 1378-1385; G. Colmenarejo, A. Alvarez-Pedraglio, J. L. Lavandera, Chemoinformatic models to predict binding affinities to human serum albumin, Journal of Medicinal Chemistry 44 ( 2001) 4370-4378; J. Zupan, P. Gasteiger, Neural Networks in Chemistry and Drug Design: An Introduction, 2nd edition, Wiley, 1999]), we have been specifically concerned with Genetic Programming. A first paper [F. Archetti, E. Messina, S. Lanzeni, L. Vanneschi, Genetic programming for computational pharmacokinetics in drug discovery and development, Genetic Programming and Evolvable Machines 8( 4) ( 2007) 17-26] has been devoted to problem ( i). The present contribution aims at developing a Genetic Programming based framework on which to build specific strategies which are then shown to be a valuable tool for problem ( ii). In this paper, we use target estrogen receptor molecules and genistein based drug compounds. Being able to precisely and efficiently predict their mutual interaction energy is a very important task: for example, it may have an immediate relationship with the efficacy of genistein based drugs in menopause therapy and also as a natural prevention of some tumors. We compare the experimental results obtained by Genetic Programming with the ones of a set of "non-evolutionary'' Machine Learning methods, including Support Vector Machines, Artificial Neural Networks, Linear and Least Square Regression. Experimental results confirm that Genetic Programming is a promising technique from the viewpoint of the accuracy of the proposed solutions, of the generalization ability and of the correlation between predicted data and correct ones. (C) 2009 Elsevier B. V. All rights reserved.
Original languageUnknown
Pages (from-to)170-182
JournalApplied Soft Computing
Issue number1
Publication statusPublished - 1 Jan 2010

Cite this