Adversarial algorithm matching original paper's implementation #770

taufeeque9 · 2023-08-10T21:07:11Z

Description

This PR updates the adversarial algorithm by training the discriminator between collecting the rollouts of the generator and training the generator. This matches the reference implementation provided in Algorithm 1 of the AIRL paper.

The modification is done by implementing the TrainDiscriminatorCallback, which is called to train the discriminator after collecting rollouts through the callback.on_rollout_end(). The callback first stores the latest rollout in the replay buffer, which is then used to train the discriminator. Once the discriminator is trained, the callback updates the generator's rollout/replay buffer by updating the rewards using the latest discriminator.

Note that we must also update the advantages and returns in the rollout buffer of the on-policy algorithms upon updating the rewards. This is tricky to do since information like value and done on the last observations of the rollouts is not stored in the rollout buffer. These are obtained in this PR by using the original advantages and rewards. A test of whether it produces correct values needs to be added.

Testing

All the tests for adversarial algorithms run successfully.

The performance of this implementation remains to be compared to the other re-implementation Change adversarial algorithms to collect rollouts first #731 and the master branch's algorithm.

This change just made some error messages go away indicating the missing imitation.algorithms.dagger.ExponentialBetaSchedule but it did not fix the root cause.

This reverts commit 8b55134.

codecov · 2023-08-10T21:16:33Z

Codecov Report

Merging #770 (5c23650) into master (19c7f35) will increase coverage by 0.03%.
Report is 2 commits behind head on master.
The diff coverage is 98.68%.

@@            Coverage Diff             @@
##           master     #770      +/-   ##
==========================================
+ Coverage   96.33%   96.37%   +0.03%     
==========================================
  Files          93       93              
  Lines        8789     8846      +57     
==========================================
+ Hits         8467     8525      +58     
+ Misses        322      321       -1

Files Changed	Coverage Δ
src/imitation/algorithms/adversarial/common.py	`97.68% <98.14%> (+0.85%)`	⬆️
src/imitation/policies/replay_buffer_wrapper.py	`100.00% <100.00%> (ø)`
tests/algorithms/test_adversarial.py	`100.00% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

ernestum

Just some nits. Throughout review will follow.

ernestum · 2023-08-29T18:17:04Z

src/imitation/policies/replay_buffer_wrapper.py

+    assert buffer.actions is not None
+    obs = buffer.observations
+    next_obs = obs[1:]
+    next_obs = np.concatenate([next_obs, obs[-1:]], axis=0)  # last obs not available


Easier to read if you do:

ext_obs = np.concatenate([obs[1:], obs[-1:]], axis=0)

ernestum · 2023-08-29T18:17:35Z

src/imitation/policies/replay_buffer_wrapper.py

+    next_obs = np.concatenate([next_obs, obs[-1:]], axis=0)  # last obs not available
+    actions = buffer.actions
+    dones = buffer.episode_starts
+    dones = np.roll(dones, -1, axis=0)


same as above: pull buffer.episode_starts in this line.

ernestum · 2023-08-29T18:27:08Z

src/imitation/algorithms/adversarial/common.py

@@ -222,16 +279,22 @@ def __init__(

        self.venv_buffering = wrappers.BufferingWrapper(self.venv)

+        self.disc_trainer_callback = TrainDiscriminatorCallback(self)


Why not define the gen_callback here like this:

self.gen_callback: List[callbacks.BaseCallback] = [self.disc_trainer_callback]

and then just append it down in the else block?

And while you are at it rename it to use a plural because it is actually more than one callback. E.g. self.gen_callbacks.

# Conflicts: # setup.py

taufeeque9 and others added 30 commits January 5, 2023 01:49

Merge py file changes from benchmark-algs

b4210c1

Clean parallel script

97bc063

Undo the changes from #653 to the dagger benchmark config files.

9291225

This change just made some error messages go away indicating the missing imitation.algorithms.dagger.ExponentialBetaSchedule but it did not fix the root cause.

Improve readability and interpretability of benchmarking tests.

276d863

Add pxponential beta scheduler for dagger

37eb914

Ignore coverage for unknown algorithms.

877383b

Cleanup and extend tests for beta schedules in dagger.

c8e55cb

Merge branch 'master' into benchmark-pr

6b9b306

Fix test cases

8576465

Add optuna to dependencies

d81eb68

Fix test case

27467d3

Merge branch 'master' into benchmark-pr

b59a768

Clean up the scripts

1a3b6b8

Remove reporter(done) since mean_return is reported by the runs

7a438da

Merge branch 'master' into benchmark-pr

5bc5835

Add beta_schedule parameter to dagger script

2e56de8

Merge branch 'master' into benchmark-pr

84e854a

Update config policy kwargs

73d8576

Changes from review

9fdf878

Fix errors with some configs

1c1dbc4

Merge branch 'master' into benchmark-pr

3467af2

Updates based on review

44c4e97

Merge branch 'master' into benchmark-pr

4d493ae

Change metric everywhere

ab01269

Merge branch 'master' into benchmark-pr

f64580e

Separate tuning code from parallel.py

e896d7d

Fix docstring

64c3a8d

Removing resume option as it is getting tricky to correctly implement

8fba0d3

Minor fixes

12ab31c

Updates from review

19b0f2c

taufeeque9 added 14 commits July 17, 2023 09:08

Fix lint error

5ce7658

Updates from the review

a8be331

Fix file name test errors

4ff006d

Add tune_run_kwargs in parallel script

6933afa

Fix test errors

77f9d9b

Fix test

54eb8a6

Fix lint

d50238f

Updates from review

3fe22d4

Simplify few lines of code

c50aa20

Updates from review

000af61

Fix test

8b55134

Revert "Fix test"

f3ba2b5

This reverts commit 8b55134.

Fix test

f8251c7

Convert Dict to Mapping in input argument

664fc37

ernestum self-assigned this Aug 28, 2023

ernestum reviewed Aug 29, 2023

View reviewed changes

ernestum added 3 commits August 30, 2023 10:47

Ignore coverage in script configurations.

8690e1d

Pin huggingface_sb3 version.

dd9eb6a

Merge branch 'master' into benchmark-pr

b3930f4

# Conflicts: # setup.py

ernestum force-pushed the adversarial-mod-new branch from 5c23650 to b01d51b Compare September 26, 2023 09:22

Update to the newest seals environment versions.

40d87ef

ernestum force-pushed the adversarial-mod-new branch from 9fac235 to 0ac66d4 Compare September 27, 2023 06:11

ernestum and others added 6 commits September 27, 2023 09:49

Push gymnasium dependency to 0.29 to ensure mujoco envs work.

71f6c92

Update adversarial algorithm

53c1212

Fix test errors

47b3874

Fix test errors

9fa8969

Don't enter the generator logging ctx twice.

3edf518

Update common.py to fix test errors

ce8c87d

ernestum force-pushed the adversarial-mod-new branch from 0ac66d4 to ce8c87d Compare September 27, 2023 07:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adversarial algorithm matching original paper's implementation #770

Adversarial algorithm matching original paper's implementation #770

taufeeque9 commented Aug 10, 2023 •

edited by ernestum

Loading

codecov bot commented Aug 10, 2023 •

edited

Loading

ernestum left a comment

ernestum Aug 29, 2023

ernestum Aug 29, 2023

ernestum Aug 29, 2023

		@@ -222,16 +279,22 @@ def __init__(

		self.venv_buffering = wrappers.BufferingWrapper(self.venv)

		self.disc_trainer_callback = TrainDiscriminatorCallback(self)

Adversarial algorithm matching original paper's implementation #770

Are you sure you want to change the base?

Adversarial algorithm matching original paper's implementation #770

Conversation

taufeeque9 commented Aug 10, 2023 • edited by ernestum Loading

Description

Testing

codecov bot commented Aug 10, 2023 • edited Loading

Codecov Report

ernestum left a comment

Choose a reason for hiding this comment

ernestum Aug 29, 2023

Choose a reason for hiding this comment

ernestum Aug 29, 2023

Choose a reason for hiding this comment

ernestum Aug 29, 2023

Choose a reason for hiding this comment

taufeeque9 commented Aug 10, 2023 •

edited by ernestum

Loading

codecov bot commented Aug 10, 2023 •

edited

Loading