This repo contains code and data for running HELPER.
(1) Start by cloning the repository:
git clone https://github.com/Gabesarch/HELPER.git
(1a) (optional) If you are using conda, create an environment:
conda create -n helper python=3.8
(2) Install PyTorch with the CUDA version you have. For example, run the following for CUDA 11.1:
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
(3) Install additional requirements:
pip install -r requirements.txt
(4) Install Detectron2 (needed for SOLQ detector) with correct PyTorch and CUDA version. E.g. for PyTorch 1.10 & CUDA 11.1:
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.10/index.html
(5) Install teach:
pip install -e teach
(6) Build SOLQ deformable attention:
cd ./SOLQ/models/ops && sh make.sh && cd ../../..
(7) Clone ZoeDepth repo
git clone https://github.com/isl-org/ZoeDepth.git
cd ZoeDepth
git checkout edb6daf45458569e24f50250ef1ed08c015f17a7
- Download the TEACh dataset following the instructions in the TEACh repo
teach_download
To our model on the TEACh dataset, you'll first need the GPT embeddings for example retrieval:
- Download GPT embeddings for example retrieval: here. Unzip it to get the gpt_embedding folder in
./data
folder (or in a desired foldered and set --gpt_embedding_dir argument). Alternatively, you can download the file with gdown (pip install gdown
):
cd data
gdown 1kqZZXdglNICjDlDKygd19JyyBzkkk-UL
unzip gpt_embeddings.zip
rm gpt_embeddings.zip
TO run our model with estimated depth and segmentation, download the SOLQ and ZoeDepth checkpoints:
- Download SOLQ checkpoint: here. Place it in the
./checkpoints
folder (or anywhere you want and specify the path with--solq_checkpoint
). Alternatively, you can download the file with gdown (pip install gdown
):
cd checkpoints
gdown 1hTCtTuygPCJnhAkGeVPzWGHiY3PHNE2j
- Download ZoeDepth checkpoint: here. Place it in the
./checkpoints
folder (or anywhere you want and specify the path with--zoedepth_checkpoint
). (Also make sure you clone the ZoeDepth repo:git clone https://github.com/isl-org/ZoeDepth.git
) Alternatively, you can download the file with gdown (pip install gdown
):
cd checkpoints
gdown 1gMe8_5PzaNKWLT5OP-9KKEYhbNxRjk9F
- (if required) Start x server. if an X server is not already running on your machine. First, open a screen with the desired node, and run the following to open an x server on that node:
python startx.py 0
Specify the server port number with the argument --server_port
(default 0).
- Set OpenAI keys. Set Azure keys:
export AZURE_OPENAI_KEY={KEY}
export AZURE_OPENAI_ENDPOINT={ENDPOINT}
(If not using Azure)
Important! If using openai API, append --use_openai
to arguments. Then set openai key:
export OPENAI_API_KEY={KEY}
- Run agent. To run the agent with all modules and estimated perception on TfD validation unseen, run the following:
python main.py \
--mode teach_eval_tfd \
--split valid_unseen \
--gpt_embedding_dir ./data/gpt_embeddings \
--teach_data_dir PATH_TO_TEACH_DATASET \
--server_port X_SERVER_PORT_HERE \
--episode_in_try_except \
--use_llm_search \
--use_constraint_check \
--run_error_correction_llm \
--zoedepth_checkpoint ./checkpoints/ZOEDEPTH-model-00015000.pth \
--solq_checkpoint ./checkpoints/SOLQ-model-00023000.pth \
--set_name HELPER_teach_tfd_validunseen
Change split to --split valid_seen
to evaluate validation seen set.
All metrics will be saved to ./output/metrics/{set_name}
. Metrics and videos will also automatically be logged to wandb.
To create movies of the agent, append --create_movie
to the arguments. This will by default create a movie for every episode rendered to ./output/movies
. To change the episode frequency of logging, alter --log_every
(e.g., --log_every 10
to render videos every 10 episodes). To remove the map visualization, append --remove_map_vis
to the arguments. This can speed up the episode since rendering the map visual can slow down episodes.
The following arguments can be removed to run the ablations:
- Remove memory augmented prompting. Add argument
--ablate_example_retrieval
. - Remove LLM search (locator) (only random). Remove
--use_llm_search
. - Remove constraint check (inspector). Remove
--use_constraint_check
. - Remove error correction (rectifier). Remove
--run_error_correction_llm
. - Change openai model type. Change
--openai_model
argument (e.g.,--openai_model gpt-3.5-turbo
).
The following arguments can be added to run with ground truth:
- GT depth
--use_gt_depth
. Reccomended to also add--increased_explore
with estimated segmentation for best performance. - GT segmentation
--use_gt_seg
. - GT action success
--use_gt_success_checker
. - GT error feedback
--use_GT_error_feedback
. - GT constraint check using controller metadata
--use_GT_constraint_checks
. - Increase max API fails
--max_api_fails {MAX_FAILS}
.
To run with user feedback, add --use_progress_check
. Two additional metric files (for feedback query 1 & 2) will be saved to ./output/metrics/{set_name}
.
See the teach_edh
branch for how to run the TEACh EDH evaluation.
If you like this paper, please cite us:
@proceedings{findings-2023-findings-association-linguistics-emnlp,
title = "Findings of the Association for Computational Linguistics: EMNLP 2023",
editor = "Sarch, Gabriel and
Wu, Yue and
Tarr, Michael and
Fragkiadaki, Katerina",
month = dec,
year = "2023",
publisher = "Association for Computational Linguistics",
}