-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vocab size of data/chembl/vocab.txt: 5623, but when get_vocab.py the new vocab.txt is 5625 #47
Comments
The same problem, did you solved it? |
I did not solve it. But. I am skipping some functionality to make it work with the provided pre-trained model and vocabulary. |
I probably solved the problem. |
Included generation step with pre-trained model that corrects the issue wengong-jin#47
When following the instructions of the README.md neither of the commands shown, seem to work out of the box.
So far I added the py_modules=['hgraph'] in the setup.py and added ",clearAromaticFlags=True)" in the chemutils.py file.
Sample from checkpoint does not work:
python generate.py --vocab data/chembl/vocab.txt --model ckpt/chembl-pretrained/model.ckpt --nsample 1000
So I tried to reproduce the vocab with:
python get_vocab.py --ncpu 16 < data/chembl/all.txt > new_vocab.txt
It works. But, new_vocab.txt has 5625 lines and data/chembl/vocab.txt 5623. And there are multiple differences, not just two.
Do you have any way to sample from checkpoint without issues?
Also, why am I getting a different vocab result from the same data/chembl/all.txt file? Is there some random operation? I left all random seeds as they are in the scripts.
The text was updated successfully, but these errors were encountered: