This tutorial is aimed at teaching you how to define tasks in the
environment or extending the environment itself. We usually inherit from
FurnitureEnv
, FurnitureBaxterEnv
, FurnitureSawyerEnv
, or FurnitureCursorEnv
and override certain functions to define the new task or environment.
In the constructor, we usually define task / environment specific parameters, such as
reward function parameters in the _env_config
dictionary to avoid polluting the
global configuration dictionary. We recommend either changing the env parameters directly
in the constructor, or reading from a file to load the env config.
The _step
function takes in an action, and outputs 4 items.
- next state
- reward of taking current action
- episode termination status
- environment information
If you look at the step function in FurnitureEnv
, it will first
calculate the change in state, then compute the reward, then log information,
and finally return the 4 items.
This function resets the robot and furniture to a starting configuration.
Usually you will override the _place_objects
function to define how the
furniture parts are initialized.
This function returns the observations seen by the agent.
This function by default will attempt to initialize the furniture pieces in a random position and orientation without collision. You should override this for your own task if you want to control the furniture initialization.
This function is called by the _step
function to compute the reward at
the current state and action. By default it is a sparse reward that depends
on the number of connected parts.
We will look at furniture/env/furniture_baxter_block.py
as a case study. In this file,
want to teach the Baxter agent how to pick up a block and move it towards a target.
We extend the FurnitureBaxterEnv
to add block picking logic.
##__init__
Here, we define all of the dense reward parameters in the _env_config
dictionary.
The _step
function is quite standard, computing the reward given the action and
logging info. We zero out the left arm to make the task easier. It calls the _compute_reward
.
This function resets the robot and furniture to a starting configuration. It calls the
_place_objects
function in the super._reset
call, which we override.
This function overrides FurnitureEnv
by fixing the initial poses of the furniture parts.
The original logic attempts to find a random configuration of poses for the parts, which
can make RL very slow to learn.
The dense reward for picking up is structured in the following phases:
- Put the arm above the block
- Lower the arm slightly
- Lower the arm s.t. the block is between fingers
- Lower the arm more
- Hold the block
- Pick up the block
- Move the block towards the target position