Life2Scenario: Can LLMs modify AV scenario following prompts?

Overview

In Automated Vehicle (AV) simulations, XML files are used to replicate real-world environments. These files often require developers to frequently update scenario descriptions or rely on deterministic automation tools. Our project aims to streamline modifications to these files by allowing changes through text prompts, bypassing direct XML manipulation, and making the process more efficient and user-friendly.

Project Structure

model/: Contains the scripts to fine-tune the model starcoder on the Life2Scenario task
dataset_generation/: Contains the scripts to generate the dataset for the Life2Scenario task

Dataset Generation

CARLA simulator(Unreal Engine) is used to generate the dataset. The dataset is generated by running the CARLA simulator and recording the scenario descriptions and the corresponding XML files. The dataset is then preprocessed to be used as training data for the model. Original scenario executor is scenario_runner package from CARLA community: CARLA #ScenarioRunner

There are three main components of the dataset generation module:

dataset_generator.py: Responsible for managing CARLA API and Simulator with the dataset generation modules.
1. Main entry point
2. Responsible for finding out which scenario is currently running and which scenario should be created next.
3. Responsible for triggering scenario_extender and scenario_manipulator classes to generate the dataset wrt the current scenario.
scene_manipulator.py: Responsible for manipulating the scenario within the context of CARLA simulator
1. Responsible for maintaining requested actions and executing them in the simulator
scenario_extender.py: Responsible for extending the scenario description with the new actions
1. Responsible for maintaining the scenario description and extending it with the new actions
2. Uses the actions from the scene_manipulator to extend the scenario description files, .xosc.
3. Responsible for saving the extended scenario description files, .xosc, with prompts, forming the dataset

Main Libraries Used

carla: Python API for CARLA simulator
scenario_runner: Scenario runner package from CARLA community

Dataset Generation Steps

Install CARLA simulator:
1. Version 0.9.15 is used for the dataset generation.
  1. Refer to Carla #Package Installation
2. Alternatively, you can install the binary from the releases:
  1. Note that we used precompiled binaries for CARLA simulator for Linux
  2. Carla #Releases 0.9.15
Install the Python API for CARLA:
1. Refer to Carla #Install client library
2. pip3 install carla
Find a base scenario, for example, LaneChangeSimple
1. Run the scenario_runner package with the base scenario:
  1. python3 ${SCENARIO_RUNNER_ROOT}/scenario_runner.py --openscenario ${SCENARIO_RUNNER_ROOT}/srunner/examples/LaneChangeSimple.xosc
Run the dataset_generator to generate the dataset:
1. python3 dataset_generator.py
Output dataset will be saved in the dataset_generation/dataset directory:
1. dataset_generation/dataset/:
  1. prompts/: Contains the prompts, .txt
  2. ref_scenarios/: Contains the reference scenario descriptions, .xosc
  3. target_scenarios/: Contains the target scenario descriptions, .xosc

Evaluation Metric

We introduce a new metric EntityCount which is calculated by: $$EntityCount = \neg (I_{\text{gt}} \oplus I_{\text{pred}})$$ where:

(I_{\text{gt}}) is 1 if the count from the ground truth matches the expected count, and 0 otherwise.
(I_{\text{pred}}) is 1 if the count from the prediction matches the expected count, and 0 otherwise.

Data Processing

preprocess.py: Responsible for preprocessing XML data. It removes specified XML elements (GlobalAction, Story, StopTrigger) from the input data. Additionally, it cleans up the XML string formatting by removing extra spaces before self-closing tags and prepends an XML declaration to the output.
postprocess.py: Responsible for post-processes XML data by integrating elements from an input XML into a predicted XML structure. It specifically extracts and removes GlobalAction, Story, and StopTrigger elements from the input data's specified parent tags and reinserts them into the predicted XML structure at designated locations. After reinserting the elements, it cleans the XML string by removing unnecessary spaces before self-closing tags and adds an XML declaration at the beginning.

Fine-tuning the Model

The bigcode/starcoderbase-1b model is fine-tuned on the Life2Scenario-minimal dataset with the transformers library.

Codebase forked for starcoder: starcoder
Original models:
- Model Link: bigcode/starcoderbase-1b
- Model Link: bigcode/starcoderbase-3b
- Model Link: codellama/CodeLlama-13b-Instruct-hf

Training Report (click to view)

Training/Loss
Evaluation/Loss

Collection (click to view)

Model Regisry

Model Name	Model Link	Dataset Name	Dataset Link
`starcoderbase_3b_life2scenario_medium_60ep`	starcoderbase_3b_life2scenario_medium_60ep	`Life2Scenario-medium`	Life2Scenario-medium
`starcoderbase_1b_life2scenario_minimal_210ep`	starcoderbase_1b_life2scenario_minimal_210ep	`Life2Scenario-minimal`	Life2Scenario-minimal
`starcoderbase_1b_life2scenario_medium_300ep`	starcoderbase_1b_life2scenario_medium_300ep	`Life2Scenario-medium`	Life2Scenario-medium
`codellama_13b_life2scenario_medium_300ep`	CodeLlama-13b-Instruct-hf-merged	`Life2Scenario-medium`	Life2Scenario-medium

Qualitative Results

We have used the following GenerationConfig to generate the results:

generation_config = GenerationConfig(
        temperature=0.9,
        top_k=50,
        top_p=0.80,
        repetition_penalty=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
        max_length=8000
    )

Example 1

Model: starcoderbase_1b_life2scenario_minimal_210ep
Prompt: Question: would you add pedestrian close to hero?
Adding the object to Storyboard
Adding the location to the Actions

Example 2

Model: starcoderbase_3b_life2scenario_medium_60ep
Prompt: Question: i would like you to remove pedestrian actor named pedestrian_w_transform_15729?
Failing to remove the right object pedestrian_w_transform_15729 from the Storyboard, only changing id of another pedestrian object

Example 3

Model: codellama_13b_life2scenario_medium_300ep
Prompt: Question: i would like to instruct you to remove pedestrian close to hero?
Able to remove the object pedestrian_close_to_hero_16871 from the Storyboard, only changing id of another pedestrian object

Ablation Studies

In-Context Learning, Program Generation and Execution(w Interpreter)

Main Libraries Used

transformers: Huggingface library for fine-tuning the model
torch: PyTorch library for fine-tuning the model
datasets: Huggingface library for handling the dataset
peft: Parameter-Efficient Fine-Tuning (PEFT)
bitsandbytes: Lightweight Python wrapper around CUDA custom functions
accelerate: PyTorch library for distributed training, on top of torch.distributed

Project Members

Onur Can Yucedag
Mk Bashar
Samia Islam

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Life2Scenario: Can LLMs modify AV scenario following prompts?

Overview

Project Structure

Dataset Generation

Main Libraries Used

Dataset Generation Steps

Evaluation Metric

Data Processing

Fine-tuning the Model

Training Report (click to view)

Collection (click to view)

Model Regisry

Qualitative Results

Example 1

Example 2

Example 3

Ablation Studies

In-Context Learning, Program Generation and Execution(w Interpreter)

Main Libraries Used

Project Members

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
app		app
assets		assets
codellama		codellama
dataset_generation		dataset_generation
dataset_process		dataset_process
metric		metric
starcoder		starcoder
.gitignore		.gitignore
README.md		README.md

yucedagonurcan/life2scenario_core

Folders and files

Latest commit

History

Repository files navigation

Life2Scenario: Can LLMs modify AV scenario following prompts?

Overview

Project Structure

Dataset Generation

Main Libraries Used

Dataset Generation Steps

Evaluation Metric

Data Processing

Fine-tuning the Model

Training Report (click to view)

Collection (click to view)

Model Regisry

Qualitative Results

Example 1

Example 2

Example 3

Ablation Studies

In-Context Learning, Program Generation and Execution(w Interpreter)

Main Libraries Used

Project Members

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages