You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Even after ~ 1 billion self-play, over 1000 checkpoints, the model seems still not converge
Thanks for your wonderful project. May I ask how do you judge the convergency of a self-play game model? Are there any evaluation metrics recommended?
I don't think evaluating with rewards (or utilities) is a good choice, because it is continuously self-improving on both sides at the training stage.
No good way from my pov, you can try train one and see it's performance vs slumbot, but this simulator don't matches the rule of ACPC. You need to modify the simulator to do that.
Thanks for your wonderful project. May I ask how do you judge the convergency of a self-play game model? Are there any evaluation metrics recommended?
I don't think evaluating with rewards (or utilities) is a good choice, because it is continuously self-improving on both sides at the training stage.
The text was updated successfully, but these errors were encountered: