TY - JOUR
T1 - Machine learning for the prediction of molecular dipole moments obtained by density functional theory
AU - Pereira, Florbela
AU - Aires-de-Sousa, João
N1 - Financial support from Fundacao para a Ciencia e a Tecnologia (FCT/MEC) Portugal, under Project PEst-OE/QUI/UI0612/2013, and Grant SFRH/BPD/108237/2015 (F.P.) are greatly appreciated. This work was also supported by the Associated Laboratory for Sustainable Chemistry-lean Processes and Technologies-LAQV which is financed by national funds from FCT/MEC (UID/QUI/50006/2013) and cofinanced by the ERDF under the PT2020 Partnership Agreement (POCI-01-0145-FEDER-007265).
PY - 2018/12/1
Y1 - 2018/12/1
N2 - Machine learning (ML) algorithms were explored for the fast estimation of molecular dipole moments calculated by density functional theory (DFT) by B3LYP/6-31G(d,p) on the basis of molecular descriptors generated from DFT-optimized geometries and partial atomic charges obtained by empirical or ML schemes. A database was used with 10,071 structures, new molecular descriptors were designed and the models were validated with external test sets. Several ML algorithms were screened. Random forest regression models predicted an external test set of 3368 compounds achieving mean absolute error up to 0.44 D. The results represent a significant improvement of the dipole moments calculated using empirical point charges located at the nucleus, even assuming the DFT-optimized geometry (root mean square error, RMSE, of 0.68 D vs. 1.53 D and R2 = 0.87 vs. 0.66).[Figure not available: see fulltext.].
AB - Machine learning (ML) algorithms were explored for the fast estimation of molecular dipole moments calculated by density functional theory (DFT) by B3LYP/6-31G(d,p) on the basis of molecular descriptors generated from DFT-optimized geometries and partial atomic charges obtained by empirical or ML schemes. A database was used with 10,071 structures, new molecular descriptors were designed and the models were validated with external test sets. Several ML algorithms were screened. Random forest regression models predicted an external test set of 3368 compounds achieving mean absolute error up to 0.44 D. The results represent a significant improvement of the dipole moments calculated using empirical point charges located at the nucleus, even assuming the DFT-optimized geometry (root mean square error, RMSE, of 0.68 D vs. 1.53 D and R2 = 0.87 vs. 0.66).[Figure not available: see fulltext.].
KW - Density functional theory (DFT)
KW - Machine learning (ML)
KW - Molecular dipole moment
KW - Partial atomic charges
KW - Quantitative structure property relationships (QSPR)
UR - http://www.scopus.com/inward/record.url?scp=85051934035&partnerID=8YFLogxK
U2 - 10.1186/s13321-018-0296-5
DO - 10.1186/s13321-018-0296-5
M3 - Article
VL - 10
JO - Journal of Cheminformatics
JF - Journal of Cheminformatics
IS - 1
M1 - 43
ER -