Thermostability models on SaprotHub Huggingface – reproducibility #15

adam-kral · 2024-07-09T10:01:49Z

Hi, I have a couple of questions, regarding the two thermostability models published on huggingface:

I have downloaded both to evaluate them. In the readme, there is some training info and the test spearman for the 650M but for the 35M there is none.
When I evaluated the 35M I got spearman 0.87 (0.91) for valid (test), which is much better than the spearman 0.697 reported for 650M in the paper (or 0.706 in model's readme). Was the 35M model trained on the same dataset splits?
Also when I tried to evaluate the downloaded 650M model in the exact same way as I did successfully with the 35M model, I got model outputs/predictions that were just zeros.

So, why does the 35M model perform so well, and how do I make the 650M return non-zero predictions?

Or, is it possible to rerun the training of the model in the paper or the published models with their original configs, and where would I find them?

Thanks!

LTEnjoy · 2024-07-09T14:32:16Z

Hi, thank you very much for pointing that out!

The 35M version was just a test model when we developed SaprotHub. It was trained on different dataset splits (we split data based on structure similarity now but not at that time). We have deleted it to avoid misunderstanding.

Also when I tried to evaluate the downloaded 650M model in the exact same way as I did successfully with the 35M model, I got model outputs/predictions that were just zeros.

Could you give more information about what you input to the model?

If you want to reproduce the training of models in the paper, you could just load the dataset from SaprotHub and train it with default config. That is a general config that is suitable for all tasks.

adam-kral · 2024-07-10T09:44:41Z

Thanks for the fast respose.

Yes I can post the notebook I used to evaluate the 650M (and 35M) model. It is the colab notebook stripped down. The most interesting cell is the last one, where the predictions are made and for some reason, only zeros are returned, even though the batch seems fine (and worked with the 35M):

https://gist.github.com/adam-kral/d8c82f02f77ae0ec1c0f9255b29c3ab6

I ran it in cloned SaprotHub repo in the colab folder. In addition, I downloaded the lmdb dataset from huggingface manually.

LTEnjoy · 2024-07-10T10:04:35Z

Hello,

I just reran your notebook and interestingly it output normally:

Can you clone the latest SaprotHub repo and retry to check whether it is because of the version inconsistency?

adam-kral · 2024-07-10T14:48:04Z

Edit Thanks, now it works! The problem was that it only works on GPU cuda, not on apple MPS nor on the cpu. There I get nans, or same numbers (zeros or some other constant).

I cloned the latest repo and installed the requirements in a fresh venv (before I a bit different environment).

Now I get on mps device (Macos)

On cpu I got nans:

LTEnjoy · 2024-07-10T15:27:58Z

An interesting bug! However it's weird that on cpu the model output nans. I transfered the model to cpu and it still output normally. I'm not sure whether it is caused by the imcompatibility between the packages and your hardware...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thermostability models on SaprotHub Huggingface – reproducibility #15

Thermostability models on SaprotHub Huggingface – reproducibility #15

adam-kral commented Jul 9, 2024 •

edited

Loading

LTEnjoy commented Jul 9, 2024

adam-kral commented Jul 10, 2024 •

edited

Loading

LTEnjoy commented Jul 10, 2024

adam-kral commented Jul 10, 2024

LTEnjoy commented Jul 10, 2024 •

edited

Loading

Thermostability models on SaprotHub Huggingface – reproducibility #15

Thermostability models on SaprotHub Huggingface – reproducibility #15

Comments

adam-kral commented Jul 9, 2024 • edited Loading

LTEnjoy commented Jul 9, 2024

adam-kral commented Jul 10, 2024 • edited Loading

LTEnjoy commented Jul 10, 2024

adam-kral commented Jul 10, 2024

LTEnjoy commented Jul 10, 2024 • edited Loading

adam-kral commented Jul 9, 2024 •

edited

Loading

adam-kral commented Jul 10, 2024 •

edited

Loading

LTEnjoy commented Jul 10, 2024 •

edited

Loading