Processing of SMILES, InChI, and Hashed Fingerprints

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review


Chemical line notations represent molecular structural formulas by sequences of ASCII characters. This chapter includes a tutorial covering two useful linear notations, which can be applied as unique non‐ambiguous molecular representations: SMILES strings and InChI identifiers. The tutorial is divided into three sections devoted to theoretical background, algorithm description and software applications, respectively. The software used is ChemAxon Jchem package version 6.3.0, 2014. The tutorial demonstrates how to calculate SMILES strings and InChI identifiers from SDFiles, and how to use them to store representations of molecular structures, search for a molecule in a collection, and detect duplicated molecules. It highlights the importance of normalization and canonicalization for unique representations. The tutorial also explains the generation of chemical hashed fingerprints for the assessment of molecular similarity, and illustrates the impact of calculation parameters on the discrimination ability of fingerprints.
Original languageEnglish
Title of host publicationTutorials in Chemoinformatics
EditorsA. Varnek
PublisherJohn Wiley & Sons, Ltd
ISBN (Electronic)9781119161110
ISBN (Print)9781119137962
Publication statusPublished - 28 Jul 2017

Fingerprint Dive into the research topics of 'Processing of SMILES, InChI, and Hashed Fingerprints'. Together they form a unique fingerprint.

Cite this