TY - JOUR
T1 - 1D 13C-NMR data as molecular descriptors in spectra - Structure relationship analysis of oligosaccharides
AU - Pereira, Florbela
N1 - The author thanks Fundacao para a Ciencia e a Tecnologia for the support through programme Ciencia 2007.
PY - 2012/4/1
Y1 - 2012/4/1
N2 - Spectra-structure relationships were investigated for estimating the anomeric configuration, residues and type of linkages of linear and branched trisaccharides using 13C-NMR chemical shifts. For this study, 119 pyranosyl trisaccharides were used that are trimers of the α or βanomers of D-glucose, D-galactose, D-mannose, L-fucose or L-rhamnose residues bonded through αor βglycosidic linkages of types 1→2, 1→3, 1→4, or 1→6, as well as methoxylated and/or N-acetylated amino trisaccharides. Machine learning experiments were performed for: (1) classification of the anomeric configuration of the first unit, second unit and reducing end; (2) classification of the type of first and second linkages; (3) classification of the three residues: reducing end, middle and first residue; and (4) classification of the chain type. Our previously model for predicting the structure of disaccharides was incorporated in this new model with an improvement of the predictive power. The best results were achieved using Random Forests with 204 di- and trisaccharides for the training set-it could correctly classify 83%, 90%, 88%, 85%, 85%, 75%, 79%, 68% and 94% of the test set (69 compounds) for the nine tasks, respectively, on the basis of unassigned chemical shifts.
AB - Spectra-structure relationships were investigated for estimating the anomeric configuration, residues and type of linkages of linear and branched trisaccharides using 13C-NMR chemical shifts. For this study, 119 pyranosyl trisaccharides were used that are trimers of the α or βanomers of D-glucose, D-galactose, D-mannose, L-fucose or L-rhamnose residues bonded through αor βglycosidic linkages of types 1→2, 1→3, 1→4, or 1→6, as well as methoxylated and/or N-acetylated amino trisaccharides. Machine learning experiments were performed for: (1) classification of the anomeric configuration of the first unit, second unit and reducing end; (2) classification of the type of first and second linkages; (3) classification of the three residues: reducing end, middle and first residue; and (4) classification of the chain type. Our previously model for predicting the structure of disaccharides was incorporated in this new model with an improvement of the predictive power. The best results were achieved using Random Forests with 204 di- and trisaccharides for the training set-it could correctly classify 83%, 90%, 88%, 85%, 85%, 75%, 79%, 68% and 94% of the test set (69 compounds) for the nine tasks, respectively, on the basis of unassigned chemical shifts.
KW - C-NMR
KW - Classification tree
KW - CPGNN
KW - Disaccharides
KW - Machine learning techniques
KW - Oligosaccharides
KW - Random Forest
KW - Trisaccharides
UR - http://www.scopus.com/inward/record.url?scp=84860234387&partnerID=8YFLogxK
U2 - 10.3390/molecules17043818
DO - 10.3390/molecules17043818
M3 - Article
C2 - 22456542
AN - SCOPUS:84860234387
VL - 17
SP - 3818
EP - 3833
JO - Molecules
JF - Molecules
IS - 4
ER -