Guanxing Lu *, Tengbo Yu*, Haoyuan Deng, Season Si Chen, Ziwei Wang, Yansong Tang †
[Project Page] | [Paper]
Performing general language-conditioned bimanual manipulation tasks is of great importance for many applications ranging from household service to industrial assembly. However, collecting bimanual manipulation data is expensive due to the high-dimensional action space, which poses challenges for conventional methods to handle general bimanual manipulation tasks. In contrast, unimanual policy has recently demonstrated impressive generalizability across a wide range of tasks because of scaled model parameters and training data, which can provide sharable manipulation knowledge for bimanual systems. To this end, we propose a plug-and-play method named AnyBimanual, which transfers pretrained unimanual policy to general bimanual manipulation policy with few bimanual demonstrations. Specifically, we first introduce a skill manager to dynamically schedule the skill representations discovered from pretrained unimanual policy for bimanual manipulation tasks, which linearly combines skill primitives with task-oriented compensation to represent the bimanual manipulation instruction. To mitigate the observation discrepancy between unimanual and bimanual systems, we present a visual aligner to generate soft masks for visual embedding of the workspace, which aims to align visual input of unimanual policy model for each arm with those during pretraining stage. AnyBimanual shows superiority on 12 simulated tasks from RLBench2 with a sizable 12.67% improvement in success rate over previous methods. Experiments on 9 real-world tasks further verify its practicality with an average success rate of 84.62%.
- Release pretrained checkpoints.
NOTE: AnyBimanual is mainly built upon the Perceiver-Actor^2 repo by Markus Grotz et al.
See INSTALL.md for installation instructions.
See ERROR_CATCH.md for error catching.
The following steps are structured in order.
Please checkout the website for pre-generated RLBench
demonstrations. If you directly use these
datasets, you don't need to run tools/bimanual_data_generator.py
from
RLBench. Using these datasets will also help reproducibility since each scene
is randomly sampled in data_generator_bimanual.py
.
We use wandb to log some curves and visualizations. Login to wandb before running the scripts.
wandb login
To train our PerAct + AnyBimanual, run:
bash scripts/train.sh BIMANUAL_PERACT 0,1 12345 ${exp_name}
where the exp_name
can be specified as you like.
To train our PerAct-LF + AnyBimanual, run:
bash scripts/train.sh PERACT_BC 0,1 12345 ${exp_name}
To train our RVT-LF + AnyBimanual, run:
bash scripts/train.sh RVT 0,1 12345 ${exp_name}
Set the augmentation_type
in the scripts/train.sh
to choose whether to apply the augmentation methods mentioned in our paper or to use the original SE3 augmentation.
To evaluate the checkpoint in simulator, you can use:
bash scripts/eval.sh BIMANUAL_PERACT 0 ${exp_name}
Demonstrations Collection by teleoperation
Data convert into RLbench2 form
python3 anybimanual_real_supply/data/preprocess_ntu_dualarm.py
Keyframe selection
python3 anybimanual_real_supply/data/auto_keyframe_mani.py
bash scripts/train_real.sh BIMANUAL_PERACT 0,1 12345 ${exp_name}
Run model inference scripts to receive real-world observation to generate actions, here we give an example of the Agent Class.
python3 anybimanual_real_supply/eval_agent_on_robot.py
After receiving the action generated by the model, you can refer to Bimanual_ur5e_action_control_for_IL to drive dual_UR5e to perform the action.
This repository is released under the MIT license.
Our code is built upon Perceiver-Actor^2, SkillDiffuser, PerAct, RLBench, and CLIP. We thank all these authors for their nicely open sourced code and their great contributions to the community.
If you find this repository helpful, please consider citing:
@article{lu2024anybimanual,
author = {Lu, Guanxing and Yu, Tengbo and Deng, Haoyuan and Chen, Season Si and Wang, Ziwei and Tang, Yansong},
title = {AnyBimanual: Transferring Single-arm Policy for General Bimanual Manipulation},
year = {2024},
}