Official Code Repository for FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks.
Chongkai Gao1, Haozhuo Zhang2, Zhixuan Xu1, Zhehao Cai1, Lin Shao1
1National University of Singapore, 2Peking University
In this paper, we present FLIP, a model-based planning algorithm on visual space that features three key modules: 1. a multi-modal flow generation model as the general-purpose action proposal module; 2. a flow-conditioned video generation model as the dynamics module; and 3. a vision-language representation learning model as the value module. Given an initial image and language instruction as the goal, FLIP can progressively search for long-horizon flow and video plans that maximize the discounted return to accomplish the task. FLIP is able to synthesize long-horizon plans across objects, robots, and tasks with image flows as the general action representation, and the dense flow information also provides rich guidance for long-horizon video generation. In addition, the synthesized flow and video plans can guide the training of low-level control policies for robot execution.
conda create -n flip python==3.8
conda activate flip
pip install -r requirements.txt
cd flip/co_tracker
wget https://huggingface.co/facebook/cotracker/resolve/main/cotracker2.pth
- Get the download access from https://huggingface.co/meta-llama/Llama-3.1-8B.
- Put the downloaded folder at
./
. You should have a file structure like this:
...
- liv
- llama_models
- Meta-Llama-3.1-8B
- consolidated.00.pth
- params.json
- tokenizer.model
- scripts
...
- Download the
model.pt
andconfig.yaml
accroding tohttps://github.com/penn-pal-lab/LIV/blob/main/liv/__init__.py#L33
. mkdir liv/resnet50
.- Put the
model.pt
andconfig.yaml
underliv/resnet50
. You should have a file structure like this:
...
- liv
- cfgs
- dataset
- examples
- models
- resnet50
- config.yaml
- model.pt
- utils
__init__.py
train_liv.py
trainer.py
- llama_models
...
-
wget https://utexas.box.com/shared/static/cv73j8zschq8auh9npzt876fdc1akvmk.zip
-
mkdir data/libero_10
-
unzip and put the 10 LIBERO-10 hdf5 files into
data/libero_10
python scripts/replay_libero_data_from_hdf5.py
By default, the resolution is 128
python scripts/video_tracking.py
.
By default, we only track the agentview demos. You may change the eye_in_hand
to true
in the config/libero_10/tracking.yaml
to track the eye_in_hand demos.
python scripts/preprocess_data_to_hdf5.py
.
By default, we only preprocess the agentview demos. You may change the eye_in_hand
to true
in the config/libero_10/preprocess.yaml
to preprocess the eye_in_hand demos.
torchrun --nnodes=1 --nproc_per_node=2 scripts/train_cvae.py
You can change config/libero_10/cvae.yaml
for custom training. Current config is for A100 40G GPUs.
torchrun --nnodes=1 --nproc_per_node=2 scripts/train_dynamics.py
You can change config/libero_10/dynamics.yaml
for custom training. Current config is for A100 40G GPUs.
python scripts/finetune_liv.py
This script will first make a liv dataset and then train on it.
You may change the configs in config/libero_10/finetune_liv.yaml
, liv/cfgs/dataset/libero_10.yaml
, and liv/cfgs/training/finetune.yaml
according to your own tasks.
torchrun --nnodes=1 --nproc_per_node=8 scripts/finetune_vae.py
You can change config/libero_10/finetune_vae.yaml
for custom training.
-
makedir models/libero_10
-
put all the trained models (agentview_dynamics.pt, cvae.pt, finetuned_vae.pt, reward.pt) under
models/libero_10
-
torchrun scripts/hill_climbing.py
- Action Module:
python scripts/eval_cvae.py
. - Dynamics Module:
scripts/train_dynamics.py
.
If you find our codes or models useful in your work, please cite our paper:
TODO
If you have any questions, feel free to contact me through email ([email protected])!