Updated README. Finished testing with a clone

vluzko · Dec 10, 2018 · 1d16083 · 1d16083
1 parent a4e077f
commit 1d16083
Show file tree

Hide file tree

Showing 3 changed files with 42 additions and 36 deletions.
diff --git a/dac/README.md b/dac/README.md
@@ -1,31 +1,38 @@
 # Discriminator-Actor-Critic
+Note that all commands must be run from with the `dac` folder.
+## Setup
+1. Setup a Python 2 environment. (Commands for virtualenv given below)
 
-1. Setup virtualenv (WITH Python2 - you can use the command *which python2* to find the PYTHON2 PATH)
 	```
     pip install virtualenv
-	virtualenv . -p {PYTHON2 PATH}
-	source bin/activate
+    virtualenv . -p $PYTHON2_PATH
+    source bin/activate
 	```
+Note: You can use the command `which python2` to find the PYTHON2_PATH.
 
 2. Installation (NOTE: need mujoco 1.31 setup -> download the zip from https://www.roboti.us/ and put the unzipped mjpro131 folder in the ~/.mujoco folder used for the mujoco license for gym-v1):
+
 	```
     pip install -r requirements.txt
-    cd dac
-    pip install -e .
     ```
 
-3. Generate Trajectory Information (for Hopper, walker, ant, half-cheetah):
-    We'll use the original GAIL implementation to generate trajectories.
+## Generate expert trajectories
+DAC requires the expert trajectories to already exist. We use the OpenAI imitation repo to generate trajectories. (This is the original GAIL repo).
 
-    ```
     git clone https://github.com/openai/imitation
     cd imitation
     python -m scripts/im_pipeline pipelines/im_pipeline.yaml 0_sampletrajs
-    cp imitation_runs/modern_stochastic/trajs/* ../DAC/trajs
-    ```
-4. Run DAC
+    cp imitation_runs/modern_stochastic/trajs/* ../trajs
+## Run DAC
+
+    python dac.py --env_id=$environment_name --expert_path=$path/to/expert/traj
+Example (Ant-v1):
+
+    python dac.py --env_id=Ant-v1 --expert_path=trajs/trajs_ant.h5
+
+
+The environment name can be `Hopper-v1`, `HalfCheetah-v1`, `Ant-v1`, or `Walker2d-v1`
+
+The path to the expert trajectory should be a path to the correspond file in the `trajs` folder.
 
-    ```
-    python dac.py
-    ```
 
diff --git a/plots/generate_plots.py b/plots/generate_plots.py
@@ -4,41 +4,40 @@
 from pathlib import Path
 from matplotlib import pyplot as plt
 
-
 plots_dir = Path(__file__).parent
 environments = ('Hopper-v1', 'HalfCheetah-v1', 'Ant-v1', 'Walker2d-v1')
 aggs = ('mean', 'sum')
 
 
 def figure_3():
-	expert = pd.read_csv(str(plots_dir / 'expert.csv'), index_col=0)
+    expert = pd.read_csv(str(plots_dir / 'expert.csv'), index_col=0)
 
-	for env in environments:
-		random = pd.read_csv(str(plots_dir / 'Random_{}.csv'.format(env)), index_col=0)
+    for env in environments:
+        random = pd.read_csv(str(plots_dir / 'Random_{}.csv'.format(env)), index_col=0)
 
-		expert_score = expert.loc[env][0]
+        expert_score = expert.loc[env][0]
 
-		fig, ax = plt.subplots()
-		ax.set_title('{} Results'.format(env))
-		ax.set_ylabel('Normalized Learner Score')
-		ax.set_xlabel('Training Step')
-		plt.ylim(-0.2, 1.3)
+        fig, ax = plt.subplots()
+        ax.set_title('{} Results'.format(env))
+        ax.set_ylabel('Normalized Learner Score')
+        ax.set_xlabel('Training Step')
+        plt.ylim(-0.2, 1.3)
 
-		for agg in aggs:
-			results = pd.read_csv(str(plots_dir / 'DAC_{}_{}.csv'.format(env, agg)), index_col=0)
-			timesteps = results['timestep']
-			avg_reward = results.iloc[:, :-1].mean(axis=1)
+        for agg in aggs:
+            results = pd.read_csv(str(plots_dir / 'DAC_{}_{}.csv'.format(env, agg)), index_col=0)
+            timesteps = results['timestep']
+            avg_reward = results.iloc[:, :-1].mean(axis=1)
 
-			average_random_reward = random.iloc[:, :-1].mean(axis=1)
-			average_random_reward.index = random['timestep']
+            average_random_reward = random.iloc[:, :-1].mean(axis=1)
+            average_random_reward.index = random['timestep']
 
-			translated_rewards = avg_reward.values - average_random_reward.loc[timesteps].values
-			normalized_rewards = translated_rewards / expert_score
-			ax.plot(timesteps, normalized_rewards, label='Batch {}'.format(agg.capitalize()))
+            translated_rewards = avg_reward.values - average_random_reward.loc[timesteps].values
+            normalized_rewards = translated_rewards / expert_score
+            ax.plot(timesteps, normalized_rewards, label='Batch {}'.format(agg.capitalize()))
 
-		ax.legend()
-		plt.savefig(str(plots_dir / '{}_results.png'.format(env)))
+        ax.legend()
+        plt.savefig(str(plots_dir / '{}_results.png'.format(env)))
 
 
 if __name__ == '__main__':
-	fire.Fire()
+    fire.Fire()
diff --git a/plots/original_results.png b/plots/original_results.png