Skip to content

yucedagonurcan/life2scenario_core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Life2Scenario: Can LLMs modify AV scenario following prompts?

Overview

In Automated Vehicle (AV) simulations, XML files are used to replicate real-world environments. These files often require developers to frequently update scenario descriptions or rely on deterministic automation tools. Our project aims to streamline modifications to these files by allowing changes through text prompts, bypassing direct XML manipulation, and making the process more efficient and user-friendly.

Project Structure

  • model/: Contains the scripts to fine-tune the model starcoder on the Life2Scenario task
  • dataset_generation/: Contains the scripts to generate the dataset for the Life2Scenario task

Dataset Generation

CARLA simulator(Unreal Engine) is used to generate the dataset. The dataset is generated by running the CARLA simulator and recording the scenario descriptions and the corresponding XML files. The dataset is then preprocessed to be used as training data for the model. Original scenario executor is scenario_runner package from CARLA community: CARLA #ScenarioRunner

There are three main components of the dataset generation module:

  1. dataset_generator.py: Responsible for managing CARLA API and Simulator with the dataset generation modules.
    1. Main entry point
    2. Responsible for finding out which scenario is currently running and which scenario should be created next.
    3. Responsible for triggering scenario_extender and scenario_manipulator classes to generate the dataset wrt the current scenario.
  2. scene_manipulator.py: Responsible for manipulating the scenario within the context of CARLA simulator
    1. Responsible for maintaining requested actions and executing them in the simulator
  3. scenario_extender.py: Responsible for extending the scenario description with the new actions
    1. Responsible for maintaining the scenario description and extending it with the new actions
    2. Uses the actions from the scene_manipulator to extend the scenario description files, .xosc.
    3. Responsible for saving the extended scenario description files, .xosc, with prompts, forming the dataset

Main Libraries Used

  • carla: Python API for CARLA simulator
  • scenario_runner: Scenario runner package from CARLA community

Dataset Generation Steps

  1. Install CARLA simulator:
    1. Version 0.9.15 is used for the dataset generation.
      1. Refer to Carla #Package Installation
    2. Alternatively, you can install the binary from the releases:
      1. Note that we used precompiled binaries for CARLA simulator for Linux
      2. Carla #Releases 0.9.15
  2. Install the Python API for CARLA:
    1. Refer to Carla #Install client library
    2. pip3 install carla
  3. Find a base scenario, for example, LaneChangeSimple
    1. Run the scenario_runner package with the base scenario:
      1. python3 ${SCENARIO_RUNNER_ROOT}/scenario_runner.py --openscenario ${SCENARIO_RUNNER_ROOT}/srunner/examples/LaneChangeSimple.xosc
  4. Run the dataset_generator to generate the dataset:
    1. python3 dataset_generator.py
  5. Output dataset will be saved in the dataset_generation/dataset directory:
    1. dataset_generation/dataset/:
      1. prompts/: Contains the prompts, .txt
      2. ref_scenarios/: Contains the reference scenario descriptions, .xosc
      3. target_scenarios/: Contains the target scenario descriptions, .xosc

Life2Scenario Dataset Creation Pipeline Overview

Evaluation Metric

We introduce a new metric EntityCount which is calculated by: $$EntityCount = \neg (I_{\text{gt}} \oplus I_{\text{pred}})$$ where:

  • (I_{\text{gt}}) is 1 if the count from the ground truth matches the expected count, and 0 otherwise.
  • (I_{\text{pred}}) is 1 if the count from the prediction matches the expected count, and 0 otherwise.

Data Processing

  1. preprocess.py: Responsible for preprocessing XML data. It removes specified XML elements (GlobalAction, Story, StopTrigger) from the input data. Additionally, it cleans up the XML string formatting by removing extra spaces before self-closing tags and prepends an XML declaration to the output.
  2. postprocess.py: Responsible for post-processes XML data by integrating elements from an input XML into a predicted XML structure. It specifically extracts and removes GlobalAction, Story, and StopTrigger elements from the input data's specified parent tags and reinserts them into the predicted XML structure at designated locations. After reinserting the elements, it cleans the XML string by removing unnecessary spaces before self-closing tags and adds an XML declaration at the beginning.

Fine-tuning the Model

The bigcode/starcoderbase-1b model is fine-tuned on the Life2Scenario-minimal dataset with the transformers library.

Training Report (click to view)

  • Training/Loss
    • A mushroom-head robot
  • Evaluation/Loss
    • A mushroom-head robot

Collection (click to view)

Model Regisry

Model Name Model Link Dataset Name Dataset Link
starcoderbase_3b_life2scenario_medium_60ep starcoderbase_3b_life2scenario_medium_60ep Life2Scenario-medium Life2Scenario-medium
starcoderbase_1b_life2scenario_minimal_210ep starcoderbase_1b_life2scenario_minimal_210ep Life2Scenario-minimal Life2Scenario-minimal
starcoderbase_1b_life2scenario_medium_300ep starcoderbase_1b_life2scenario_medium_300ep Life2Scenario-medium Life2Scenario-medium
codellama_13b_life2scenario_medium_300ep CodeLlama-13b-Instruct-hf-merged Life2Scenario-medium Life2Scenario-medium

Qualitative Results

We have used the following GenerationConfig to generate the results:

generation_config = GenerationConfig(
        temperature=0.9,
        top_k=50,
        top_p=0.80,
        repetition_penalty=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
        max_length=8000
    )

Example 1

  1. Model: starcoderbase_1b_life2scenario_minimal_210ep
  2. Prompt: Question: would you add pedestrian close to hero?
  3. Adding the object to Storyboard
    1. alt text
  4. Adding the location to the Actions
    1. alt text

Example 2

  1. Model: starcoderbase_3b_life2scenario_medium_60ep
  2. Prompt: Question: i would like you to remove pedestrian actor named pedestrian_w_transform_15729?
  3. Failing to remove the right object pedestrian_w_transform_15729 from the Storyboard, only changing id of another pedestrian object
    1. alt text

Example 3

  1. Model: codellama_13b_life2scenario_medium_300ep
  2. Prompt: Question: i would like to instruct you to remove pedestrian close to hero?
  3. Able to remove the object pedestrian_close_to_hero_16871 from the Storyboard, only changing id of another pedestrian object
    1. alt text

Ablation Studies

In-Context Learning, Program Generation and Execution(w Interpreter)

In-Context Learning, Program Generation and Execution(w Interpreter)

Main Libraries Used

  • transformers: Huggingface library for fine-tuning the model
  • torch: PyTorch library for fine-tuning the model
  • datasets: Huggingface library for handling the dataset
  • peft: Parameter-Efficient Fine-Tuning (PEFT)
  • bitsandbytes: Lightweight Python wrapper around CUDA custom functions
  • accelerate: PyTorch library for distributed training, on top of torch.distributed

Project Members

  • Onur Can Yucedag
  • Mk Bashar
  • Samia Islam

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published