Reinforcement Learning project of Machine Learning and Deep Learning 2024

Students

Student group: s282575_s331543_s321277_SUPPA_PERRUCCI_DUGASDUVILLARD

Lorenzo Suppa, Vito Perrucci, Tanguy Dugas du Villard

Politecnico di Torino

Machine Learning and Deep Learning course, Spring semester 2024.

Code organisation overview

The code presented here doesn't stick to the provided template. In order to make it easier to deal with multi-step training and data saving, a wrapper class named 'Session' has been created. In one single session, it is easy to train one, or multiple agents, load, reload and save models, as well as keep track on the metrics. The main part of the codes are the train.py and test_agent.py functions, that are called at nearly every steps. The student is able to interact with the session using its methods

A detailed explanation of the project's structure, approach, and results is provided in the project report, which can be found in the PDF file named s282575_s331543_s321277_SUPPA_PERRUCCI_DUGASDUVILLARD.pdf.

Tasks

Reinforce and AC

The two version of the agent (reinforce with (and without) baseline and the actor-critic) has been coded in two different files: agent/actor_critic_agent.py and agent/reinforce_agent.py. Both extend the agent.agent.Agent class, that implements all the agents methods provided in the template, but the update_policy one that is defined differently for each type of agent. Both agent classes represent the implementation of the tasks 2 and 3.

PPO

We use stable-baseline's PPO, on which we add callbacks to save the training reward. The PPOSession is used to perform experiment on it.

To train, we use the TrainAndTestCallback to store the training reward, optionally regularly train the PPO on a different environment during its training, and optionnally set a maximum number of episodes.

UDR and GDR.

We slightly modifed the CustomHopper to make a GDRHopper and an UDRHopper, whose masses can be modified easily. We use them to perfrom UDR or GDR experiments. With PPO, we use GDRCallback and UDRCallback.

ADR

The ADR is perform on PPO via th ADRCallback, using discriminator and particles defined in the domain_randomization folder, alongside with the UDRHopper and GDRHopper.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
agent		agent
domain_randomization		domain_randomization
env		env
session		session
.gitignore		.gitignore
PPO_hp_GDR.ipynb		PPO_hp_GDR.ipynb
PPO_hp_UDR.ipynb		PPO_hp_UDR.ipynb
README.md		README.md
actor_critic_experiments.ipynb		actor_critic_experiments.ipynb
make_comparison_image.ipynb		make_comparison_image.ipynb
plot_ppo.ipynb		plot_ppo.ipynb
plot_ppo_experiments.ipynb		plot_ppo_experiments.ipynb
ppo-hp-tuning.ipynb		ppo-hp-tuning.ipynb
ppo_1500_pretraining.ipynb		ppo_1500_pretraining.ipynb
ppo_adr_pretraining.ipynb		ppo_adr_pretraining.ipynb
ppo_train_and_test.ipynb		ppo_train_and_test.ipynb
reinforce_experiments.ipynb		reinforce_experiments.ipynb
requirements.txt		requirements.txt
rolling_avg.py		rolling_avg.py
s282575_s331543_s321277_SUPPA_PERRUCCI_DUGASDUVILLARD.pdf		s282575_s331543_s321277_SUPPA_PERRUCCI_DUGASDUVILLARD.pdf
test_random_policy.py		test_random_policy.py
train_sb3.py		train_sb3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning project of Machine Learning and Deep Learning 2024

Students

Code organisation overview

Tasks

Reinforce and AC

PPO

UDR and GDR.

ADR

About

Releases

Packages

Contributors 2

Languages

Tanguy-ddv/rl_project

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning project of Machine Learning and Deep Learning 2024

Students

Code organisation overview

Tasks

Reinforce and AC

PPO

UDR and GDR.

ADR

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages