Abstract
Chemical line notations represent molecular structural formulas by sequences of ASCII characters. This chapter includes a tutorial covering two useful linear notations, which can be applied as unique non‐ambiguous molecular representations: SMILES strings and InChI identifiers. The tutorial is divided into three sections devoted to theoretical background, algorithm description and software applications, respectively. The software used is ChemAxon Jchem package version 6.3.0, 2014. The tutorial demonstrates how to calculate SMILES strings and InChI identifiers from SDFiles, and how to use them to store representations of molecular structures, search for a molecule in a collection, and detect duplicated molecules. It highlights the importance of normalization and canonicalization for unique representations. The tutorial also explains the generation of chemical hashed fingerprints for the assessment of molecular similarity, and illustrates the impact of calculation parameters on the discrimination ability of fingerprints.
Original language | English |
---|---|
Title of host publication | Tutorials in Chemoinformatics |
Editors | A. Varnek |
Publisher | John Wiley & Sons, Ltd |
Chapter | 4 |
ISBN (Electronic) | 9781119161110 |
ISBN (Print) | 9781119137962 |
DOIs | |
Publication status | Published - 28 Jul 2017 |