You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's nice work! However I have a question. Since I'm not so familiar with Reinforce Learning, I wonder which part of it has RL? In 3.3.2 fine-tuning, "Update the model P(G,S) on the fine-tuning set $D^f$ using policy gradient method" It seems that it uses RL here. However, in the code, it just compute the topo, atom and bond type loss between the expanded $S_i$ and $G_i^k$.
Thanks!
The text was updated successfully, but these errors were encountered:
Allow me to comment. I don't think you will find algorithms like the REINFORCE here. What Jin meant is improving p(G|S) using desire chemical properties. This happens in the property_filter() method in finetune.py.
For posterity (happy to be corrected by the authors), what the finetuning step (and property_filter) seems to consist in is to generate $Nm$ datapoints from $m$ rationales, filter those points to keep only those with the desired properties, and take a step of maximum likelihood on those remaining points.
This might be a totally valid thing to do, but it is indeed far from policy gradient. In the strictest sense, you could interpret this as a baseline-less REINFORCE with $R=0$ for all the rejected points (which would make their gradient 0) and $R=1$ for the points that property_filter keeps. In practice this is not a recommended RL setup, as it has all sorts of instabilities.
It's nice work! However I have a question. Since I'm not so familiar with Reinforce Learning, I wonder which part of it has RL? In 3.3.2 fine-tuning, "Update the model P(G,S) on the fine-tuning set$D^f$ using policy gradient method" It seems that it uses RL here. However, in the code, it just compute the topo, atom and bond type loss between the expanded $S_i$ and $G_i^k$ .
Thanks!
The text was updated successfully, but these errors were encountered: