SMILES Transformer

SMILES Transformer extracts molecular fingerprints from string representations of chemical molecules.
The transformer learns latent representation that is useful for various downstream tasks through autoencoding task.

Requirement

This project requires the following libraries.

NumPy
Pandas
PyTorch > 1.2
tqdm
RDKit

Dataset

Canonical SMILES of 1.7 million molecules that have no more than 100 characters from Chembl24 dataset were used.
These canonical SMILES were transformed randomly every epoch with SMILES-enumeration by E. J. Bjerrum.

Pre-training

After preparing the SMILES corpus for pre-training, run:

$ python pretrain_trfm.py

Pre-trained model is here.

Downstream Tasks

See experiments/ for the example codes.

Cite

@article{honda2019smiles,
    title={SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery},
    author={Shion Honda and Shoi Shi and Hiroki R. Ueda},
    year={2019},
    eprint={1911.04738},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
experiments		experiments
smiles_transformer		smiles_transformer
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMILES Transformer

Requirement

Dataset

Pre-training

Downstream Tasks

Cite

About

Releases

Packages

Languages

License

chengyunzhang/smiles-transformer

Folders and files

Latest commit

History

Repository files navigation

SMILES Transformer

Requirement

Dataset

Pre-training

Downstream Tasks

Cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages