Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In which part does it incorporate RL? #5

Open
YifanDengWHU opened this issue Apr 1, 2021 · 2 comments
Open

In which part does it incorporate RL? #5

YifanDengWHU opened this issue Apr 1, 2021 · 2 comments

Comments

@YifanDengWHU
Copy link

It's nice work! However I have a question. Since I'm not so familiar with Reinforce Learning, I wonder which part of it has RL? In 3.3.2 fine-tuning, "Update the model P(G,S) on the fine-tuning set $D^f$ using policy gradient method" It seems that it uses RL here. However, in the code, it just compute the topo, atom and bond type loss between the expanded $S_i$ and $G_i^k$.
Thanks!

@bhomass
Copy link

bhomass commented Nov 25, 2021

Allow me to comment. I don't think you will find algorithms like the REINFORCE here. What Jin meant is improving p(G|S) using desire chemical properties. This happens in the property_filter() method in finetune.py.

@bengioe
Copy link

bengioe commented Aug 1, 2022

I also went looking for the RL and found none.

For posterity (happy to be corrected by the authors), what the finetuning step (and property_filter) seems to consist in is to generate $Nm$ datapoints from $m$ rationales, filter those points to keep only those with the desired properties, and take a step of maximum likelihood on those remaining points.

This might be a totally valid thing to do, but it is indeed far from policy gradient. In the strictest sense, you could interpret this as a baseline-less REINFORCE with $R=0$ for all the rejected points (which would make their gradient 0) and $R=1$ for the points that property_filter keeps. In practice this is not a recommended RL setup, as it has all sorts of instabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants