Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building blocks for PEBBLE #625

Open
wants to merge 55 commits into
base: master
Choose a base branch
from
Open

Building blocks for PEBBLE #625

wants to merge 55 commits into from

Conversation

dan-pandori
Copy link
Contributor

Description

Creates an entropy reward replay wrapper to support the unsupervised state entropy based pre-training of an agent, as described in the PEBBLE paper.
https://sites.google.com/view/icml21pebble

Testing

Added unit tests.

@yawen-d
Copy link
Contributor

yawen-d commented Nov 14, 2022

Thanks for the implementations!

mifeet pushed a commit that referenced this pull request Nov 29, 2022
mifeet pushed a commit that referenced this pull request Nov 29, 2022
mifeet pushed a commit that referenced this pull request Nov 30, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
@mifeet mifeet force-pushed the dpandori_wellford branch from 61e6cea to 2ab0780 Compare December 1, 2022 22:03
mifeet pushed a commit that referenced this pull request Dec 1, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
mifeet pushed a commit that referenced this pull request Dec 1, 2022
@mifeet mifeet force-pushed the dpandori_wellford branch from d23d98b to 73b1e36 Compare December 2, 2022 23:39
…ardNets can be injected from the outside
@mifeet mifeet force-pushed the dpandori_wellford branch 4 times, most recently from efc5ae0 to a0bacca Compare December 10, 2022 20:46
@mifeet
Copy link
Contributor

mifeet commented Dec 10, 2022

@AdamGleave: reacting to your comments here together:

I'd prefer wrapping it with a NormalizedRewardNet, they're conceptually doing very different things, and we might want to use different normalization schemes (RunningNorm often works worse than EMANorm)

Ok, it required a larger refactor, but you can see how it looks in the last couple of commits.

A good thing is that this change also addresses your other comment. It simplified the entropy reward classes (separate entropy reward and switching from pre-traininig reward) and allows for more configurability, at the expense of making wiring a little more complicated (in train_preference_comparison.py).

It also results in two changes internally:

  • Previously, the running mean/var statistics for normalization were first updated, then normalization was applied. Now these are swapped.
  • Previously, reward reward calculation required conversions numpy -> torch -> numpy, now it internally converts numpy -> torch -> numpy -> torch -> numpy (because that's what the existing code for NormalizedRewardNet does). Though this applies just for pretraining.

@mifeet mifeet force-pushed the dpandori_wellford branch 7 times, most recently from 7434ee6 to 4fd0758 Compare December 13, 2022 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants