Skip to content

Latest commit

 

History

History
34 lines (21 loc) · 1.91 KB

appendix.md

File metadata and controls

34 lines (21 loc) · 1.91 KB

Appendix

Curriculum

curriculum

Reach:

  1. Minimize distance between palm and object (without moving the latter) while encouraging max. hand aperture.
  2. Minimize distance between palm and object and additional bonus for contact between hand and object.

Grasp & move:

  1. Minimize distance between target position (x,y: initial object - z: 40 cm) while encouraging the contact between fingertips and objects.
  2. As step 3. but fixing the z-target position (x,y: initial object - z: initial object + 40 cm)
  3. As step 4. but needed to restart
  4. We changed the target position from initial object to the final position but keeping the z-target position 40 cm above the z-goal position. We modified the hyperparameter of the target box (from phase 1 to phase 2). Additionally, we modified the reward component by giving more weight to palm distance over fingertip distance and introducing action regularization.
  5. As step 6. but we fixed the key_frame id and trained for longer time

Insert:

  1. We included the solved component in the reward.
  2. As step 8. but needed to restart
  3. Enlarge the hyperparameter object space to achieve a more robust policy

All the trained models, environment configurations, main files, and tensorboard logs are all present in the output/trained_agents folder.

Architecture and algorithm

We use RecurrentPPO from Stable Baselines 3 as our base algorithm with the following architecture for both the actor and the critic with nothing shared between the two:

obs --> 256 LSTM --> 256 Linear --> 256 Linear --> output

All the layers have ReLU activation functions and the output, of course, is the value for the critic and the 63-dimensional continuous actions for the actor.