-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CanonicalForm changes outcome of gameplay #273
Comments
Hi could you check #263 and related issues and let me know if there's a link? |
Yes, #263 seems to be connected to this issue. To better investigate that possible bug, I'm using the following code which is based upon the MCTS logic and the playGame method from the Arena class.
Notice that I fix the random seed for reproducibility. With this code I can quickly simulate either switching players (1, -1, 1, ...) or switching the board (canonical board, next_player is always 1). The code is compatible to every game that implements the abstract Game class. I count wins and draws with the following code:
My findings:
My own implementation shows the same pattern as TicTacToe - player wins, opponent wins and draws are roughly evenly distributed, as soon as I use only canonical mode the player wins get added to the opponent wins. My current understanding is that the outcome of using a canonical board should be exactly the same. However, after days of trying to figure something out, I'm not so sure about that assumption anymore. I guess this issue massively impacts the training process, since the model never gets to observe a player win, but I'm not really sure on that either. |
Wow you're on to something. Can you check #156 and what happens if you revert back to the previous version? I'll look closer in the following days (sorry for the slow response). |
I made some progress based on your comment. I updated my test code with the information from the previous version:
Now I can simulate what happens if next_player is multiplied to the value of the game. As for the code in Arena:
Neither of these versions really change the problem with the canonicalForm - experiment with TicTacToe:
Could this be problematic for the training process? That the train examples only contain opponent wins and draws? I solved this problem in my own game. I explicitly keep track of the current player within the state vector - with this hack it is possible to observe all three possible outcomes (player wins, opponent wins and draws). However, this seems to impact training negatively. I just started a new training session and the old model keeps winning. So I'm not sure anymore if this is the problem. Especially since the other games are implemented like this and seem to train fine for other people (I haven't started a train session for TicTacToe). Maybe logically an opponent win is the same as a player win since the MCTS plays against itself ... |
I found a relevant bug in "Coach.py". In line 66 the game ending is calculated as follows:
Which is not consistend to how this value is calculated in Arena.py. It needs to be:
I fixed this issues in my local copy and started a new round of training for my own game implementation. Hoping for the best. |
Thanks for digging into this! I believe line 69 of coach Line 69 in 1db83a7
does the multiplication. |
I think you are right. It's pretty hard not to get lost in the code. I'm still struggling to get my training working. Right now the new model always loses to the old model. Something must be still wrong |
I'm still on it. I traced the problem back to the MCTS implementation. Both Coach and Arena use the CanonicalForm only to call MCTS and then perform the action like this:
Now, MCTS needs to simulate the rest of the game from the current game state but only uses CanonicalForm like this:
Which leads to the following problem:
Is this intended behaviour? Is this the reaseon why "-v" gets returned? But why is self.Es[s] not the negative value of the GameEnded method? |
For my thesis, I'm trying to implement Nine Men's Morris as a new game. I already implemented every method from the abstract Game class and I got the train loop running. However, the agent doesn't learn anything, after two days of training every model since the beginning has been rejected because it couldn't win against the previous iteration.
I started investigating and found out that once the game is run in canonical form (like during the MCTS-Episodes) the only game outcomes are: opponent is winning or game is a draw. There were no samples gathered where the player was winning, therefore the net probably couldn't learn.
To better understand the error, I build a dummy loop with random actions. Here are some findings:
I checked the other included games - Othello doesn't have a draw game ending, and with my dummy code the outcome of canonicalForm games and non-canonical Form games is exactly the same. However, the same issue happens with TicTacToe and Connect4.
I'm starting to think that this behavior is inherent to this implementation, as I couldn't figure out anything for the last couple of days.
My questions:
Any help would be greatly appreciated.
The text was updated successfully, but these errors were encountered: