Imagine doing everything in life without ever gaining any reward for it whatsoever. A very large portion of the foundation models out there live this tragic life, and its time to change it.
The goal here is to create a solid framework for LLM/RL's much like stable baseline (open-ai's gym) is for more the more traditional RL landscape. I hope to share largely the same values as their repo with a few additions:
- Stay familiar to pseudo from literature. (nice for implementing)
- Reduce the overhead of scaling to larger models accross different machines (nice for training)
- Implement the latest schemes and methods and evaluate them in various environments (nice for evaluations)
- Make this a nice place for researchers in general.
(ordered on ambition)
- Guess the city
- Math
- Chess
- SWE/MLE-bench
- Factorio
- Minecraft
This repo is under construction. If you want to contribute please do :) If you want to share ideas on how to make it better/cleaner/leaner/simpler? feel free to contact me on linkedin/email/whatever channel you want.