In a computer game I liked to play as a kid there exist a minigame called Jigsaw. It is hard to do efficiently. The reward varies dependent on how much steps you need to solve, so it will be interesting to find a solver. It is a tetris like puzzle which can be solved easily using heuristics. For learning purpose I will use reinforcement learning.
conda create -n jigsaw numpy pytorch -c anaconda -c pytorch
conda activate jigsaw
maybe use
conda create --name jigsaw python
conda activate jigsaw
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
conda install -c anaconda numpy
pip install wandb
wandb login .......
The Field consists of 4 rows and 6 cols and is initialized empty
At each turn you can place a piece on the board or draw a random new one frome the card boxes.
There are 6 possible puzzle pieces you can get out of the box:
The game is done if the field is completely filled with forms with nothing empty.
To get an idea about if this is bad or good here some information about the rewards:
10 or fewer:
Large treasure chest
11 to 24:
medium treasure chest
25 or more:
small tresure chest
After setting up the basic rules for the game I created a brute force algorithm, which received following statistics on 100 runs:
Average Brute Force (Turn: 32.08 Used: 10.74 Not used: 21.34)
S = Small reward
M = Medium reward
L = Large reward
R = Round number (=solved puzzles)
T = Tiles used overall
O = Average Tiles per round
The new model reached the following result
L: 2
M: 71
S: 13
S_bad: 14
Next versions should reward more for large chests and penalize longer runs (+25 steps), ... idea penalty increases with each step
This was the basic approch I tried based on the flappy bird example. This was not really nice performing.
Using a episodic approach looks more promising and makes more sense, cause the reward if a decision was good is based on the outcome of every episode. With big amount of replay_memory this looks quite promising. Actually this one needs way to much time for training (days) and is even then not that good performing than expected.
- Parrallelized running (faster training) -> decouple playing the game from training ... training process can fully use GPU while the playing works on multiple CPU threads and just adds to the GPU list
- play maybe 100 games manual to have something to compare my results too, maybe I just have to high expecatation and the model is already better/similar than human
- Do MCTS https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa and compare
we could create a openapi of the finished model. It would have low traffic (entire state can be transferred as a single integer value 4 byte) it will be encoded in the binary form ... field =6*4=24 bit 1 byte left for more .. 6 puzzle piece =3bit ... 5 left .. maybe for flags reserved?
https://www.toptal.com/deep-learning/pytorch-reinforcement-learning-tutorial https://pub.towardsai.net/understanding-tensor-dimensions-in-deep-learning-models-with-pytorch-4ee828693826 https://en-wiki.metin2.gameforge.com/index.php/Fishing_Jigsaw