How do you judge/track the convergency of the holdem model? #7

Josh00-Lu · 2024-02-24T03:01:52Z

Even after ~ 1 billion self-play, over 1000 checkpoints, the model seems still not converge

Thanks for your wonderful project. May I ask how do you judge the convergency of a self-play game model? Are there any evaluation metrics recommended?

I don't think evaluating with rewards (or utilities) is a good choice, because it is continuously self-improving on both sides at the training stage.

bupticybee · 2024-02-26T01:32:17Z

Even after ~ 1 billion self-play, over 1000 checkpoints, the model seems still not converge

Thanks for your wonderful project. May I ask how do you judge the convergency of a self-play game model? Are there any evaluation metrics recommended?

I don't think evaluating with rewards (or utilities) is a good choice, because it is continuously self-improving on both sides at the training stage.

No good way from my pov, you can try train one and see it's performance vs slumbot, but this simulator don't matches the rule of ACPC. You need to modify the simulator to do that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do you judge/track the convergency of the holdem model? #7

How do you judge/track the convergency of the holdem model? #7

Josh00-Lu commented Feb 24, 2024 •

edited

Loading

bupticybee commented Feb 26, 2024

How do you judge/track the convergency of the holdem model? #7

How do you judge/track the convergency of the holdem model? #7

Comments

Josh00-Lu commented Feb 24, 2024 • edited Loading

bupticybee commented Feb 26, 2024

Josh00-Lu commented Feb 24, 2024 •

edited

Loading