Skip to content

Latest commit

 

History

History
85 lines (55 loc) · 5.06 KB

README.md

File metadata and controls

85 lines (55 loc) · 5.06 KB

SEVD : Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception


In recent years, there has been an increasing focus on neuromorphic or event-based vision due to its ability to excel under high dynamic range conditions, offer high temporal resolution, and consume less power than conventional frame-based vision sensors such as RGB cameras. The event cameras, also known as dynamic vision sensors (DVS), mimic the behavior of biological retinas by continuously sampling incoming light and generating signals only when there is a change in light intensity. This results in an event data stream represented as a sequence of ⟨x, y, p, t⟩ tuple, where (x, y) denotes pixel position, t represents time, and p indicates polarity (positive or negative contrast).

While event-based sensing represents a novel area, research efforts have been limited in recent years to fully utilize the capabilities of event-based cameras for perception tasks. Notably, researchers have predominantly used event- based cameras like DAVIS346 by iniVation and Prophesee’s IMX636 / EVK 4 HD to construct automotive datasets. Additionally, researchers have employed frame-to-event simulators such as ESIM and v2e to generate synthetic event-based data. However, it only converts RGB frames of an outdoor scene from MVSEC. This highlights the significant scarcity of readily available synthetic event-based datasets in the field. To bridge this gap and leverage the potential of synthetic data to generate diverse and high-quality vision data tailored for traffic monitoring, we present SEVD – a Synthetic Event-based Vision Dataset designed for autonomous driving and traffic monitoring tasks.

Download

The dataset can be downloaded using this link.

Dataset Overview

SEVD provides multi-view (360°) dataset comprising 27 hr of fixed and 31 hr of ego perception data, with over 9M bounding boxes, recorded across diverse conditions and varying parameters. The event cameras are complemented by five different types of sensors, including RGB, depth, optical flow, semantic, instance segmentation cameras, GNSS and IMU sensors resulting in a diverse array of data.

Folder Structure

SEVD
├── LICENSE
├── images/
├── rvt/
├── ultralytics/
├── README.md
├── carla/ # data generaton pipeline

Baseline

Converting .npz (xytp) Event Stream Files for RVT and RED Training

This guide provides instructions on how to convert .npz (xytp) event stream files for training RVT (Recurrent Vision Transformers) and RED (Recurrent Event-camera Detector) models.

Step 1: Compile .npz Files

Compile the individual .npz files into a single file.

Step 2: Convert to .hdf5 for RVT Training

Convert the compiled .npz file to an .h5 file format for further preprocessing to tensors as detailed in the documentation. [Metavision SDK] (https://docs.prophesee.ai/stable/index.html)

Step 3: Convert to .csv Files

Convert the .npz file to .csv files for further processing.

Step 4: Convert to Metavision HDF5 Format

Use the Metavision SDK (Software Development Kit) to convert the .csv files to Metavision proprietary .raw format for further preprocessing to tensors. Refer to the Metavision SDK documentation for detailed instructions.

Step 5: Train RED

Utilize the generated .hdf5 tensors files from the previous step to train the RED model.

Following these steps will prepare your event stream data for training RVT and RED models effectively.

RVT | YOLO |

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation

@article{aliminati2024sevd,
  title={SEVD: Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception},
  author={Aliminati, Manideep Reddy and Chakravarthi, Bharatesh and Verma, Aayush Atul and Vaghela, Arpitsinh and Wei, Hua and Zhou, Xuesong and Yang, Yezhou},
  journal={arXiv preprint arXiv:2404.10540},
  year={2024}
}