AtomLenz is a chemical structure recognition tool providing atom-level localization, and can therefore segment the image into the different atoms and bonds. Our method operates by using a self-labeling strategy to generate atom-level annotations, enhancing the dataset’s value, and using this enriched representation to fine-tune the model to a new domain.
If you like this work, consider citing our related paper accepted in CVPR 2024:
@article{oldenhof2024atom,
title={Atom-Level Optical Chemical Structure Recognition with Limited Supervision},
author={Oldenhof, Martijn and De Brouwer, Edward and Arany, Adam and Moreau, Yves},
journal={arXiv preprint arXiv:2404.01743},
year={2024}
}
Please check out our huggingface space if you want to quickly test AtomLenz:
install ProbKT
install AtomLenz:
pip install -e .
download datasets in datasets folder
download models in models folder
python run_scripts/predict_only_smiles.py --experiment_path_atoms models/atoms_model --experiment_path_bonds models/bonds_model --experiment_path_stereo models/stereos_model --experiment_path_charges models/charges_model --data_path ../datasets/hand_drawn_dataset/test/ --score_thresh 0.65
predictions are stored in preds_atomlenz
file.
In case true SMILES are available for dataset also performance metrics can be reported:
python run_scripts/predict_smiles.py --experiment_path_atoms models/atoms_model --experiment_path_bonds models/bonds_model --experiment_path_stereo models/stereos_model --experiment_path_charges models/charges_model --data_path <absolute_path>/datasets/hand_drawn_dataset/test/ --score_thresh 0.65
predictions are stored in preds_atomlenz_long
file.
The procedure to pretrain AtomLenz is described here: pretrained atomlenz
The procedure to train AtomLenz on target domain is described here: train atomlenz