You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I came up this question when reading paper of Alpha(Go)Zero, and I know formula to calculate Elo.
But when it comes to self-play, for example, player A and B (which is the same agent) starts with score 1400 for game chess. and after 1 round, A score is 1500 and B score is 1300.
Then... no matter whether the agent is trained or not, it starts to beat itself again, but what is the basic score used to calculate Elo for agent after 2nd round? The 1500, 1300 or some other way? I didn't come up a right way to do so.
In a word, I don't know how data of Elo trend graph is generated with trained steps as x axis and Elo as yaxis, when game is a 2 side zero-sum game and trained with only self-play.
The text was updated successfully, but these errors were encountered:
I came up this question when reading paper of Alpha(Go)Zero, and I know formula to calculate Elo.
But when it comes to self-play, for example, player A and B (which is the same agent) starts with score 1400 for game chess. and after 1 round, A score is 1500 and B score is 1300.
Then... no matter whether the agent is trained or not, it starts to beat itself again, but what is the basic score used to calculate Elo for agent after 2nd round? The 1500, 1300 or some other way? I didn't come up a right way to do so.
In a word, I don't know how data of Elo trend graph is generated with trained steps as x axis and Elo as yaxis, when game is a 2 side zero-sum game and trained with only self-play.
The text was updated successfully, but these errors were encountered: