Description

APE Tokenizer (Atom Pair Encoding Tokenizer) is a tokenizer designed to handle SMILES and SELFIES molecular representations. It works similarly to BPE (Byte Pair Encoding), while ensuring that tokens preserve chemical information, making it ideal for molecular data. This tokenizer is fully compatible with the Hugging Face transformers library and can be easily integrated into any model that uses tokenizers.
Date made available26 Jan 2024
PublisherGitHub

Cite this