In Automated Vehicle (AV) simulations, XML files are used to replicate real-world environments. These files often require developers to frequently update scenario descriptions or rely on deterministic automation tools. Our project aims to streamline modifications to these files by allowing changes through text prompts, bypassing direct XML manipulation, and making the process more efficient and user-friendly.
model/
: Contains the scripts to fine-tune the modelstarcoder
on theLife2Scenario
taskdataset_generation/
: Contains the scripts to generate the dataset for theLife2Scenario
task
CARLA simulator(Unreal Engine) is used to generate the dataset. The dataset is generated by running the CARLA simulator and recording the scenario descriptions and the corresponding XML files. The dataset is then preprocessed to be used as training data for the model. Original scenario executor is scenario_runner
package from CARLA community: CARLA #ScenarioRunner
There are three main components of the dataset generation module:
dataset_generator.py
: Responsible for managing CARLA API and Simulator with the dataset generation modules.- Main entry point
- Responsible for finding out which scenario is currently running and which scenario should be created next.
- Responsible for triggering
scenario_extender
andscenario_manipulator
classes to generate the dataset wrt the current scenario.
scene_manipulator.py
: Responsible for manipulating the scenario within the context of CARLA simulator- Responsible for maintaining requested actions and executing them in the simulator
scenario_extender.py
: Responsible for extending the scenario description with the new actions- Responsible for maintaining the scenario description and extending it with the new actions
- Uses the actions from the
scene_manipulator
to extend the scenario description files,.xosc
. - Responsible for saving the extended scenario description files,
.xosc
, withprompts
, forming the dataset
carla
: Python API for CARLA simulatorscenario_runner
: Scenario runner package from CARLA community
- Install CARLA simulator:
- Version 0.9.15 is used for the dataset generation.
- Refer to Carla #Package Installation
- Alternatively, you can install the binary from the releases:
- Note that we used precompiled binaries for CARLA simulator for Linux
- Carla #Releases 0.9.15
- Version 0.9.15 is used for the dataset generation.
- Install the Python API for CARLA:
- Refer to Carla #Install client library
pip3 install carla
- Find a base scenario, for example,
LaneChangeSimple
- Run the
scenario_runner
package with the base scenario:python3 ${SCENARIO_RUNNER_ROOT}/scenario_runner.py --openscenario ${SCENARIO_RUNNER_ROOT}/srunner/examples/LaneChangeSimple.xosc
- Run the
- Run the
dataset_generator
to generate the dataset:python3 dataset_generator.py
- Output dataset will be saved in the
dataset_generation/dataset
directory:dataset_generation/dataset/
:prompts/
: Contains the prompts,.txt
ref_scenarios/
: Contains the reference scenario descriptions,.xosc
target_scenarios/
: Contains the target scenario descriptions,.xosc
We introduce a new metric EntityCount
which is calculated by:
- (I_{\text{gt}}) is 1 if the count from the ground truth matches the expected count, and 0 otherwise.
- (I_{\text{pred}}) is 1 if the count from the prediction matches the expected count, and 0 otherwise.
preprocess.py
: Responsible for preprocessing XML data. It removes specified XML elements (GlobalAction
,Story
,StopTrigger
) from the input data. Additionally, it cleans up the XML string formatting by removing extra spaces before self-closing tags and prepends an XML declaration to the output.postprocess.py
: Responsible for post-processes XML data by integrating elements from an input XML into a predicted XML structure. It specifically extracts and removesGlobalAction
,Story
, andStopTrigger
elements from the input data's specified parent tags and reinserts them into the predicted XML structure at designated locations. After reinserting the elements, it cleans the XML string by removing unnecessary spaces before self-closing tags and adds an XML declaration at the beginning.
The bigcode/starcoderbase-1b
model is fine-tuned on the Life2Scenario-minimal
dataset with the transformers
library.
- Codebase forked for starcoder: starcoder
- Original models:
- Model Link: bigcode/starcoderbase-1b
- Model Link: bigcode/starcoderbase-3b
- Model Link: codellama/CodeLlama-13b-Instruct-hf
Model Name | Model Link | Dataset Name | Dataset Link |
---|---|---|---|
starcoderbase_3b_life2scenario_medium_60ep |
starcoderbase_3b_life2scenario_medium_60ep | Life2Scenario-medium |
Life2Scenario-medium |
starcoderbase_1b_life2scenario_minimal_210ep |
starcoderbase_1b_life2scenario_minimal_210ep | Life2Scenario-minimal |
Life2Scenario-minimal |
starcoderbase_1b_life2scenario_medium_300ep |
starcoderbase_1b_life2scenario_medium_300ep | Life2Scenario-medium |
Life2Scenario-medium |
codellama_13b_life2scenario_medium_300ep |
CodeLlama-13b-Instruct-hf-merged | Life2Scenario-medium |
Life2Scenario-medium |
We have used the following GenerationConfig
to generate the results:
generation_config = GenerationConfig(
temperature=0.9,
top_k=50,
top_p=0.80,
repetition_penalty=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
max_length=8000
)
- Model:
starcoderbase_1b_life2scenario_minimal_210ep
- Prompt:
Question: would you add pedestrian close to hero?
- Adding the object to
Storyboard
- Adding the location to the
Actions
- Model:
starcoderbase_3b_life2scenario_medium_60ep
- Prompt:
Question: i would like you to remove pedestrian actor named pedestrian_w_transform_15729?
- Failing to remove the right object
pedestrian_w_transform_15729
from the Storyboard, only changing id of another pedestrian object
- Model:
codellama_13b_life2scenario_medium_300ep
- Prompt:
Question: i would like to instruct you to remove pedestrian close to hero?
- Able to remove the object
pedestrian_close_to_hero_16871
from the Storyboard, only changing id of another pedestrian object
transformers
: Huggingface library for fine-tuning the modeltorch
: PyTorch library for fine-tuning the modeldatasets
: Huggingface library for handling the datasetpeft
: Parameter-Efficient Fine-Tuning (PEFT)bitsandbytes
: Lightweight Python wrapper around CUDA custom functionsaccelerate
: PyTorch library for distributed training, on top oftorch.distributed
- Onur Can Yucedag
- Mk Bashar
- Samia Islam