LLM Reinforcement Learning Framework

Imagine doing everything in life without ever gaining any reward for it whatsoever. A very large portion of the foundation models out there live this tragic life, and its time to change it.

The goal here is to create a solid framework for LLM/RL's much like stable baseline (open-ai's gym) is for more the more traditional RL landscape. I hope to share largely the same values as their repo with a few additions:

Core Values

Stay familiar to pseudo from literature. (nice for implementing)
Reduce the overhead of scaling to larger models accross different machines (nice for training)
Implement the latest schemes and methods and evaluate them in various environments (nice for evaluations)
Make this a nice place for researchers in general.

Planned Environments

(ordered on ambition)

Guess the city
Math
Chess
SWE/MLE-bench
Factorio
Minecraft

Contributing

This repo is under construction. If you want to contribute please do :) If you want to share ideas on how to make it better/cleaner/leaner/simpler? feel free to contact me on linkedin/email/whatever channel you want.

Contact

📧 darrynbiervliet@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LLM Reinforcement Learning Framework

Core Values

Planned Environments

Contributing

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

LLM Reinforcement Learning Framework

Core Values

Planned Environments

Contributing

Contact