Skip to content

Latest commit

 

History

History
368 lines (258 loc) · 17.3 KB

README.md

File metadata and controls

368 lines (258 loc) · 17.3 KB
LOGO

KIOS --- Knowledge-based Intelligent Operation System

This is a project for robot task planning and execution based on langchain agent, LLM, behavior tree and robot expert skills.

⭐ A conference paper based on this project has been submitted to ICRA 2025. A preprint is available here.

⭐ A workshop paper based on this project is accepted by ICRA 2024. Check it here.

🎥 The video of human-in-the-loop behavior tree generation is updated. Check it here.

👀 The human-in-the-loop workflow tutorial is updated. Please check here for more information.

  • About the old version: See project kios_ros2. The old version is developed based on ROS2 and is refactored as this project for several technical reasons.

  • About the new version: The no-ros version, which aims at simplizing the system structure, is now actively developed. The python package for this project is kios_bt_planning.

  • About the robot interface: This project is developed to cooperate with the robot interface mios, which is the skill base developed by the KI Fabrik team. Mios provides public docker image, which however does not include the necessary modifications for this project (for example, the object grounding process is changed to optional for this project, and skills and kinematic settings for tool-based manipulation are newly developed). A mios docker image for this project will be packed up and published in the future. Currently, for running the demo, please uncomment the simulation-related code in the script to allow running dummy execution (check here).

You could also may deploy your own methods to generate robot command in mios_task_factory.py and your own methods to execute the commands in robot_command.py. You can also define your own command class. Please search for MiosCall and KiosCall globally in the project for more details.

Intro

headline

KIOS is a LLM & behavior tree-based robot task planning system developed by BlackBird for his master thesis. The system is written in python. The idea is to integrate LLMs into the robot task planning system for automatic behavior tree generation and modification.

The LLM is used for generating the task plan in the form of behavior trees based on the provided domain knowledge (prompt engineering or RAG). The APIs for generating, modifying and executing the behavior trees are exposed to the LLM agent. With the feedback from the robot(also the nature language feedbacks from the user), the LLM agent can modify the behavior tree and generate new plans dynamically to finish the robotic assembly tasks.

The usecases are from the siemens robot assembly challenge and the furniture-bench.

Contents

What is KIOS?

KIOS is a robot intelligent task planning system developed by BlackBird for his master thesis. Some of the key features are:

  • natural language understanding and interacting
  • assembly task planning based on state and knowledge
  • behavior tree generation, modification and execution

Getting Started

Requirements

For the client (robot) side:

  • Ubuntu 20.04 LTS
  • conan 1.59.0 (conan 2 is not compatible with the project mios)
  • linux Realtime kernal. This is the requirement of robot control interface (1000Hz control loop). For walkthrough please check here.

For the server side (for local LLMs, not deployed yet):

  • Ubuntu 20.04 LTS
  • CUDA 12.1 or higher
  • RAM 32GB or higher
  • GPU 24GB or higher

Install

It is highly recommended to use a virtual environment for the project.

conda create -n kios python=3.10
conda activate kios
  1. Install dependency packages.
pip3 install -r requirements.txt
sudo apt-get install graphviz
# install the package kios_bt_planning
cd kios_bt_planning
pip3 install -e .
# this is for testing the project
conda install ipython
  1. (Skip this if you do not need world state visualization) install neo4j.

The application can be downloaded from here.

After setting up the neo4j server, please change the autherization information in kios_bt_planning/kios_world/neo4j_interface.py.

  1. Set up the mios (branch = kios) and the franka robot.

BB: For MIRMI users, check the project mios for more information. The docker image's name is "mirmi/mios", but is not compatible with this project. The skills necessary for the robot manipulation in kios are still being actively developed. A new docker image will be released as soon as possible.

  1. (skip this now) Install llama.cpp according to the docs. Please aware that you need to enable CUDA backend.
# in the virtual environment
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
  1. Set up your openai api-key.

If you want to use the openai gpt models as your LLM, please set up your openai api-key globally according to this link.

Or you don't think it is annoying, then use getpass to input the api-key every time you run the project.

BB: Protect your api-key carefully and prevent any secret leakage.

  1. Set up langsmith.

If you want to use langsmith to minitor the LLM queries, take a look at this link to set the api-key.

You can also use something else like langfuse to monitor the LLM queries.

  1. Set up huggingface (skip this for now).

  2. Set up the mongoDB.

Please check this link to install the mongoDB. You should also start the mongoDB service after the installation!

Packages

  • data:
    • prompt: the prompt engineering files for the LLMs.
    • world_definition: the definitions of the actions in pddl style.
    • router_utterances: the utterances for the semantic router.
  • experiments: the experiment files of different problems.
    • chair...
    • gearset...
      • scene
      • domain
      • problem
    • gearset1... (for development and testing)
      • scene
      • domain
      • problem
    • demo (for demo)
      • human_in_the_loop_sync.py
      • iterative_generation_sync.py
      • one_step_generation_sync.py
      • recursive_generation_sync.py
      • world_state.json
      • scene.json
      • ...
  • kios_bt_planning
    • kios_agent: the agents for task planning and behavior tree generating
      • kios_llm_bt: prompt engineering files for end-to-end behavior tree generating
    • kios_bt: modules for basic behavior tree functionality.
      • Behavior nodes (actions and conditions)
      • The factory class for generating behavior trees
      • Behavior tree json interface
      • Mios asynchronization module
      • ...
    • kios_domain: domain knowledge written in pddl with unified-planning
      • pddl python interfaces.
      • domain knowledge definitions.
      • ...
    • kios_planner: discarded now
    • kios_robot: robot modules for real-world robot manipulation
      • kios_vision
      • robot_interface: interface methods to execute the actions in behavior trees.
      • robot_proprioceptor: class for interacting with the robot (get/set states).
      • mios_task_factory: task factory for miosskill, mioscall and kioscall.
      • mios_async: asynchronization module for robot_command
      • robot_command: the command (list of calls/skills) for the robot.
    • kios_scene: modules for the task scene model
      • mongodb_interface
      • scene_factory
      • (scene-world_linker)
    • kios_world: modules for the world model
      • world_interface: interfaces for query/update the world state.
      • graph_interface: interfaces for interacting with inner world graph.
      • neo4j_interface: interfaces for neo4j database.
    • kios_utils: utility modules
    • tests: test files for the modules above.

System Structure

The Concept

World State

The world state in the framework is modeled with a dictionary-like structure and organized in a JSON object. Using JSON files as world representation leverages the rich JSON-related data in the pre-training phase.

world state world state vis

The key values in the world state are explained below:

  • Objects

    A list of the objects in the world, including their names and the properties they have.

  • Properties

    Properties are typically unary state variables that indicate object affordances and availability. These properties can change during task execution. For example, is_available(tool1) indicates that the tool is available for task execution, and this status may change when the tool is occupied.

  • Constraints

    A list of constraints in the world that the user defines, including the constraint name and the two objects affected by the constraint. A constraint can be either a geometry constraint between two objects (e.g., a cylinder can be inserted into a round hole) or a form of user knowledge (e.g., a clamp-gripper can be used to manipulate a large-sized gear). Constraints are pre-defined knowledge and cannot be changed during the task process.

  • Relations

    A list of relations in the world, including the relation name and the two objects involved. Most relations are geometry (e.g., a peg is inserted into a hole), while others are semantic (e.g., the hand is holding a clamp-gripper). The task target can be defined as relations that are changeable during the plan execution.

Behavior Tree

The BTs generated and utilized in the system are in JSON format.

behavior tree
behaivor tree vis

In the JSON file of BTs, each node has a summary that provides a brief description and a name that reflects the node type and employs domain knowledge definitions of the name form. There are several node types, including selector, sequence, condition (which is further classified into target and precondition), and action. The selector and sequence nodes control the tick flow of BTs and contain a list of subsequent nodes called children. Condition nodes labeled as target are typically children of selectors, while those categorized as preconditions are found as children of sequences. It is crucial that all nodes align with their corresponding actions or predicates, as defined within the domain knowledge. Control flow nodes in BTs have no memory, which means each tick starts at the root and traverses through all nodes anew, disregarding previous states of the control flow nodes. The basic structure of a unit subtree includes a root selector node, a target condition node as its first child to verify target satisfaction, followed by a sequence node aimed at fulfilling the target condition. The sequence node starts with several precondition nodes that validate necessary conditions before executing an action, and it concludes with an action node. This action node is designed to achieve effects that satisfy the target node in the upper-level selector, ensuring the subtree's functional coherence and goal-directed behavior.

Prompt

Here is an overview of the prompt structure used in the project:

behavior tree

Something to try

0. Enable the dummy execution

The docker image of mios is currently not available. You can enable the dummy execution by uncommenting the code in the demo script, which allows the execution to be simulated and the effects of the actions will be applied to the world state after the execution.

The code for dummy execution is:

return behavior_tree_simulation_step(state)

Uncommenting this line will call the simulation node of the langgraph to simulate the execution of the behavior tree and skip the interaction with the robot interface.

1. Runtime script for robot commands (For MIRMI users)

The scripts runtime_script.py (just search them in the project) are live scripts for modifying mios memory, teaching mios objects (check mios documentation to understand what are the objects in mios), quick environment setup and robot command testing.

Import it with ipython so you can call the function at runtime:

# in the virtual environment
# change to its dir (gearset1 for example)
cd experiments/gearset1

# you need ipython installed in conda
ipython -i runtime_script.py

# run the commands...

Please check the script for more information about the functions.

2. Human-in-the-loop behavior tree generation (For all users)

The human-in-the-loop behavior tree generation is a process for generating behavior trees iteratively with the help of human feedback. User input is first passed to the assembly planner, which makes a high-level assembly plan including several product-concentrated assembly steps. Then the first step is passed to the sequential planner to generate an action sequence in natural language, which helps to generate the corresponding behavior tree in the behavior tree generator. The behavior tree is a mid-level plan about robot actions and condition checking. The user is asked to provide feedback to help improve or correct the beahvior tree in natural language. The feedback is then used to modify the behavior tree and generate a new plan(tree). The process is repeated until the user is satisfied with the behavior tree. Then the behavior tree is executed by the robot, which calls the robot interface to run low-level motion primitives or skills. The execution will stop when the tree gets a feedback and the user will be asked to provide feedback again. After successfully finishing the task, the plan updater will update the plan in the assembly planner and the process will be repeated for the next step until the whole assembly task is finished.

Following is the workflow for human-in-the-loop behavior tree generation:

human-in-the-loop generation

User input: assembly instructions (natural language).

User_feedback: suggestions for the behavior tree (natural language).

# in the virtual environment
cd experiments/demo
python human_in_the_loop_sync.py

It is strongly recommended to watch the video here to understand the workflow.

Testing

For module testing please check the test folder in kios_bt_planning.

To use ipython for debug, install ipython in your environment (otherwise will use the system-wide interpreter):

conda install ipython

Development Log

...

Contribute

You are welcome to contribute to the project by starting an issue or making a pull request.

Citation

@inproceedings{Ao2024workshop-LLM-BT,
 author = {Ao, Jicong and Wu, Yansong and Wu, Fan and Haddadin, Sami},
 booktitle = {ICRA 2024 Workshop Exploring Role Allocation in Human-Robot Co-Manipulation},
 title = {Behavior Tree Generation using Large Language Models for Sequential
Manipulation Planning with Human Instructions and Feedback},
 year = {2024}
}

License

MIT License

Sources