RL Gobang

Kickstart

bazel build -c opt //mcts:capi
pip install -r requirements.txt

Training

Training uses multiple self-play processes and a single training process. Each self-play process runs MCTS search with neural network oracle to generate self-play games. Then these games will be transmitted to the training process with IPC to improve the neural network oracle. The training process also features a evaluator to check whether the new neural network is better than the previous check point. The evaluation criterion is that A is better than B only if A wins B with white stone (defensive position). Only when a definitely stronger network has arisen, new check point will be saved.

The master executable controls the training and self-play processes. It decides the number of self-play processes, which is decisive to the generating speed. master always finishes immediately since it only creates and terminates the worker processes. The worker processes (training/self-playing) exists in the form of daemons. So it may need additional efforts to deploy the program to Windows systems.

# start training
python src/master.py start

# stop training
python src/master.py kill

Playing with Checkpoints

gobang_env is a GUI program to visualize the level of certain checkpoint. It also contains functions to save the playing history to images.

python src/gobang_env.py

Available checkpoints:

Onedrive
BaiduYun with share code: qthq

Achievements

Beats tito (an AI who achieved 3 first and 2 second place in Gomocup) with black stone.

Gets high rank at the platform "微信小程序-欢乐五子棋腾讯版" (in progress).

Performance Notes

The following optimizations are extremely useful in the training process.

Removing Python runtime in MCTS search with a native library (C++/Rust) brings a speedup of roughly 30.
Multi-process can be utilized to accelerate the training procedure by the number of self-play processes, which is bottleneck of the whole system.
Virtual loss is a critical optimization trick to batch the neural network inference during MCTS self-play. It can easily speedup self-play by about 6 times.

Paper

AlphaZero

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

RL Gobang

Kickstart

Training

Playing with Checkpoints

Achievements

Performance Notes

Paper

Files

README.md

Latest commit

History

README.md

File metadata and controls

RL Gobang

Kickstart

Training

Playing with Checkpoints

Achievements

Performance Notes

Paper