TY - JOUR
T1 - Machine Learning Methods to Predict Density Functional Theory B3LYP Energies of HOMO and LUMO Orbitals
AU - Pereira, Florbela
AU - Xiao, Kaixia
AU - Latino, Diogo A.R.S.
AU - Wu, Chengcheng
AU - Zhang, Qingyou
AU - Aires-De-Sousa, Joao
N1 - sem pdf conforme despacho.
Fundacao para a Ciencia e a Tecnologia (FCT/MEC) Portugal, under Project PEst-OE/QUI/UI0612/2013, and grants SFRH/BPD/63192/2009 (D.A.R.S.L.) and SFRH/BPD/108237/2015 (F.P.).
FCT/MEC (UID/QUI/50006/2013) and cofinanced by the ERDF under the PT2020 Partnership Agreement (POCI-01-0145-FEDER-007265).
Natural Science Foundation of China (No. 20875022).
International Science and Technology Cooperation of Henan Province, China (No. 162102410012).
State Education Ministry of China (No. 20091001).
PY - 2017/1/23
Y1 - 2017/1/23
N2 - Machine learning algorithms were explored for the fast estimation of HOMO and LUMO orbital energies calculated by DFT B3LYP, on the basis of molecular descriptors exclusively based on connectivity. The whole project involved the retrieval and generation of molecular structures, quantum chemical calculations for a database with >111 000 structures, development of new molecular descriptors, and training/validation of machine learning models. Several machine learning algorithms were screened, and an applicability domain was defined based on Euclidean distances to the training set. Random forest models predicted an external test set of 9989 compounds achieving mean absolute error (MAE) up to 0.15 and 0.16 eV for the HOMO and LUMO orbitals, respectively. The impact of the quantum chemical calculation protocol was assessed with a subset of compounds. Inclusion of the orbital energy calculated by PM7 as an additional descriptor significantly improved the quality of estimations (reducing the MAE in >30%).
AB - Machine learning algorithms were explored for the fast estimation of HOMO and LUMO orbital energies calculated by DFT B3LYP, on the basis of molecular descriptors exclusively based on connectivity. The whole project involved the retrieval and generation of molecular structures, quantum chemical calculations for a database with >111 000 structures, development of new molecular descriptors, and training/validation of machine learning models. Several machine learning algorithms were screened, and an applicability domain was defined based on Euclidean distances to the training set. Random forest models predicted an external test set of 9989 compounds achieving mean absolute error (MAE) up to 0.15 and 0.16 eV for the HOMO and LUMO orbitals, respectively. The impact of the quantum chemical calculation protocol was assessed with a subset of compounds. Inclusion of the orbital energy calculated by PM7 as an additional descriptor significantly improved the quality of estimations (reducing the MAE in >30%).
KW - ORGANIC PHOTOVOLTAICS
KW - POLYMER DIELECTRICS
KW - EXPERIMENTAL-MODELS
KW - BIG DATA
KW - ELECTROPHILICITY
KW - NUCLEOPHILICITY
UR - http://www.scopus.com/inward/record.url?scp=85013997660&partnerID=8YFLogxK
U2 - 10.1021/acs.jcim.6b00340
DO - 10.1021/acs.jcim.6b00340
M3 - Article
C2 - 28033004
AN - SCOPUS:85013997660
SN - 1549-9596
VL - 57
SP - 11
EP - 21
JO - Journal of Chemical Information and Modeling
JF - Journal of Chemical Information and Modeling
IS - 1
ER -