Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you judge/track the convergency of the holdem model? #7

Open
Josh00-Lu opened this issue Feb 24, 2024 · 1 comment
Open

How do you judge/track the convergency of the holdem model? #7

Josh00-Lu opened this issue Feb 24, 2024 · 1 comment

Comments

@Josh00-Lu
Copy link

Josh00-Lu commented Feb 24, 2024

Even after ~ 1 billion self-play, over 1000 checkpoints, the model seems still not converge

Thanks for your wonderful project. May I ask how do you judge the convergency of a self-play game model? Are there any evaluation metrics recommended?

I don't think evaluating with rewards (or utilities) is a good choice, because it is continuously self-improving on both sides at the training stage.

@bupticybee
Copy link
Owner

Even after ~ 1 billion self-play, over 1000 checkpoints, the model seems still not converge

Thanks for your wonderful project. May I ask how do you judge the convergency of a self-play game model? Are there any evaluation metrics recommended?

I don't think evaluating with rewards (or utilities) is a good choice, because it is continuously self-improving on both sides at the training stage.

No good way from my pov, you can try train one and see it's performance vs slumbot, but this simulator don't matches the rule of ACPC. You need to modify the simulator to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants