🔥 Jul 10, 2024: StateFlow is accepted at COLM 2024! Our paper can be found here: https://arxiv.org/abs/2403.11322.
🔥 Feb 29, 2024: StateFlow is implemented and integrated into AutoGen, checkout the BlogPost and the Notebook!
-
InterCode: InterCode is designed as an interactive code environments to evaluate language agents that can code. From it, we evaluate StateFlow on two datasets:
- (1) SQL: The InterCode-SQL adapts the Spider dataset for MySQL, containing 1034 task instances. For each task, a MySQL interpreter is set up with all relevant tables within a docker container.
- (2) Bash: The InterCode-Bash dataset has 200 task instances curated from the NL2Bash dataset.
-
ALFWorld: ALFWorld contains interactive TextWorld environments that parallel embodied worlds in the ALFRED dataset. The aligned environments allow agents to reason and learn high-level policies in an abstract space before solving embodied tasks through low-level actuation.
We recommend create separate environments for InterCode and ALFWorld.
Both benchmarks require the installation of AutoGen:
pip install pyautogen
Then, create a "OAI_CONFIG_LIST" file and add your key, this will be used to access the LLM models:
[
{
"model": "gpt-35-turbo-1106",
"api_key": "Your openai key here",
},
{
"model": "gpt-35-turbo-1106",
"api_key": "Your azure key",
"api_type": "azure",
"base_url": "Your base url here",
"api_version": "Your api version here",
}
]
When running the experiments, make sure to change the path to the OAI_CONFIG_LIST
file in corresponding python files (e.g., ALFWorld/stateflow.py
, InterCode/flow_bash.py
, InterCode/flow_sql.py
):
config_list = autogen.config_list_from_json(
"Your path to OAI_CONFIG_LIST file here",
filter_dict={"model": model},
)
-
Please follow the instructions in the InterCode repository to download intercode. Use the build from source instructions:
git clone https://github.com/princeton-nlp/intercode.git cd intercode conda env create -f environment.yml conda activate intercode
-
After you are in
intercode
folder, copy files fromInterCode
folder tointercode
folder:bash ../InterCode/copy_files.sh
We did some modifications to the
setup.sh
and the docker files:- Change sql dockerfile path to
ic_spider_dbs.sql
. - Create 4 different docker images for the 4 different bash tasks.
- Change sql dockerfile path to
-
Run
setup.sh
to create the docker images for the InterCode Bash and SQL environments.bash setup.sh
-
Run StateFlow for InterCode SQL:
bash scripts/stateflow.sh
-
Please follow the instructions in the ALFWorld repository to install the ALFWorld environment.
-
Change the relevant path in
stateflow.py
:os.environ["ALFWORLD_DATA"] = "Your path to ALFWorld data here."
-
Run stateflow for ALFWorld:
python stateflow.py
If you find this repo helpful, please kindly cite our publication:
@article{wu2024stateflow,
title={StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows},
author={Wu, Yiran and Yue, Tianwei and Zhang, Shaokun and Wang, Chi and Wu, Qingyun},
journal={arXiv preprint arXiv:2403.11322},
year={2024}
}
Results on InterCode SQL:
Results on InterCode Bash:
Results on ALFWorld:
Ablation of states on the InterCode SQL dataset with GPT-3.5-Turbo:
StateFlow + Reflexion on ALFWorld (with 6 iterations):