Abstract
Random Forest (RF) QSPR models were developed with a data set of homolytic bond dissociation energies (BDE) previously calculated by B3LYP/6-311++G(d,p)//DFTB for 2263 sp3C−H covalent bonds. The best set of attributes consisted in 114 descriptors of the carbon atom (counts of atom types in 5 spheres around the kernel atom and ring descriptors). The optimized model predicted the DFT-calculated BDE of an independent test set of 224 bonds with MAE=2.86 kcal/mol. A new data set of 409 bonds from the iBonD database (http://ibond.nankai.edu.cn) was predicted by the RF with a modest MAE (5.36 kcal/mol) but a relatively high R2 (0.75) against experimental energies. A prediction scheme was explored that corrects the RF prediction with the average deviation observed for the k nearest neighbours (KNN) in an additional memory of experimental data. The corrected predictions achieved MAE=2.22 kcal/mol for an independent test set of 145 bonds and the corresponding experimental bond energies.
Original language | English |
---|---|
Article number | 2200193 |
Number of pages | 8 |
Journal | Molecular Informatics |
Volume | 42 |
Issue number | 1 |
Early online date | 27 Sept 2022 |
DOIs | |
Publication status | Published - Jan 2023 |
Keywords
- bond energy
- density functional calculations
- learning transfer
- machine learning
- quantitative structure-property relationship