Skip to content

Commit

Permalink
Updated README. Finished testing with a clone
Browse files Browse the repository at this point in the history
  • Loading branch information
vluzko committed Dec 10, 2018
1 parent a4e077f commit 1d16083
Show file tree
Hide file tree
Showing 3 changed files with 42 additions and 36 deletions.
35 changes: 21 additions & 14 deletions dac/README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,38 @@
# Discriminator-Actor-Critic
Note that all commands must be run from with the `dac` folder.
## Setup
1. Setup a Python 2 environment. (Commands for virtualenv given below)

1. Setup virtualenv (WITH Python2 - you can use the command *which python2* to find the PYTHON2 PATH)
```
pip install virtualenv
virtualenv . -p {PYTHON2 PATH}
source bin/activate
virtualenv . -p $PYTHON2_PATH
source bin/activate
```
Note: You can use the command `which python2` to find the PYTHON2_PATH.

2. Installation (NOTE: need mujoco 1.31 setup -> download the zip from https://www.roboti.us/ and put the unzipped mjpro131 folder in the ~/.mujoco folder used for the mujoco license for gym-v1):

```
pip install -r requirements.txt
cd dac
pip install -e .
```

3. Generate Trajectory Information (for Hopper, walker, ant, half-cheetah):
We'll use the original GAIL implementation to generate trajectories.
## Generate expert trajectories
DAC requires the expert trajectories to already exist. We use the OpenAI imitation repo to generate trajectories. (This is the original GAIL repo).

```
git clone https://github.com/openai/imitation
cd imitation
python -m scripts/im_pipeline pipelines/im_pipeline.yaml 0_sampletrajs
cp imitation_runs/modern_stochastic/trajs/* ../DAC/trajs
```
4. Run DAC
cp imitation_runs/modern_stochastic/trajs/* ../trajs
## Run DAC

python dac.py --env_id=$environment_name --expert_path=$path/to/expert/traj
Example (Ant-v1):

python dac.py --env_id=Ant-v1 --expert_path=trajs/trajs_ant.h5


The environment name can be `Hopper-v1`, `HalfCheetah-v1`, `Ant-v1`, or `Walker2d-v1`

The path to the expert trajectory should be a path to the correspond file in the `trajs` folder.

```
python dac.py
```

43 changes: 21 additions & 22 deletions plots/generate_plots.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,41 +4,40 @@
from pathlib import Path
from matplotlib import pyplot as plt


plots_dir = Path(__file__).parent
environments = ('Hopper-v1', 'HalfCheetah-v1', 'Ant-v1', 'Walker2d-v1')
aggs = ('mean', 'sum')


def figure_3():
expert = pd.read_csv(str(plots_dir / 'expert.csv'), index_col=0)
expert = pd.read_csv(str(plots_dir / 'expert.csv'), index_col=0)

for env in environments:
random = pd.read_csv(str(plots_dir / 'Random_{}.csv'.format(env)), index_col=0)
for env in environments:
random = pd.read_csv(str(plots_dir / 'Random_{}.csv'.format(env)), index_col=0)

expert_score = expert.loc[env][0]
expert_score = expert.loc[env][0]

fig, ax = plt.subplots()
ax.set_title('{} Results'.format(env))
ax.set_ylabel('Normalized Learner Score')
ax.set_xlabel('Training Step')
plt.ylim(-0.2, 1.3)
fig, ax = plt.subplots()
ax.set_title('{} Results'.format(env))
ax.set_ylabel('Normalized Learner Score')
ax.set_xlabel('Training Step')
plt.ylim(-0.2, 1.3)

for agg in aggs:
results = pd.read_csv(str(plots_dir / 'DAC_{}_{}.csv'.format(env, agg)), index_col=0)
timesteps = results['timestep']
avg_reward = results.iloc[:, :-1].mean(axis=1)
for agg in aggs:
results = pd.read_csv(str(plots_dir / 'DAC_{}_{}.csv'.format(env, agg)), index_col=0)
timesteps = results['timestep']
avg_reward = results.iloc[:, :-1].mean(axis=1)

average_random_reward = random.iloc[:, :-1].mean(axis=1)
average_random_reward.index = random['timestep']
average_random_reward = random.iloc[:, :-1].mean(axis=1)
average_random_reward.index = random['timestep']

translated_rewards = avg_reward.values - average_random_reward.loc[timesteps].values
normalized_rewards = translated_rewards / expert_score
ax.plot(timesteps, normalized_rewards, label='Batch {}'.format(agg.capitalize()))
translated_rewards = avg_reward.values - average_random_reward.loc[timesteps].values
normalized_rewards = translated_rewards / expert_score
ax.plot(timesteps, normalized_rewards, label='Batch {}'.format(agg.capitalize()))

ax.legend()
plt.savefig(str(plots_dir / '{}_results.png'.format(env)))
ax.legend()
plt.savefig(str(plots_dir / '{}_results.png'.format(env)))


if __name__ == '__main__':
fire.Fire()
fire.Fire()
Binary file added plots/original_results.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 1d16083

Please sign in to comment.