Preprocessing arbitrarily structured data for AI with Awkward Array

Processing heterogeneous multimodal data presents challenges. These datasets feature complex, irregular structures due to nested or variable-sized outputs from different sensors, or due to missing data values. The data are typically of mixed types, complicating the preprocessing steps required before they can be fed into algorithms like multimodal representation models. AI practitioners must manage these complexities effectively. Awkward Array is a Python library designed to process arbitrarily structured data. Operating on an array-programming paradigm, it allows users to manipulate data using NumPy-like syntax. Awkward Array also includes GPU-accelerated kernels, enabling the preprocessing of complex data directly on modern hardware accelerators, which can significantly optimize the training process and reduce data transfer latency to the device. We introduce the Awkward Array library and provide examples that demonstrate its usage, highlighting its potential as an AI preprocessor.

Contribution at the AISSAI - Heterogeneous Data and Large Representation Models in Science workshop on the 3rd of October 2024 in Toulouse, France. Contribution details: https://indico.in2p3.fr/event/33412/contributions/143423/.

The material are based on the "Thinking In Arrays" tutorial, presented at the SciPy 2024 conference.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
img		img
.gitignore		.gitignore
README.md		README.md
ak_talk.ipynb		ak_talk.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Preprocessing arbitrarily structured data for AI with Awkward Array

About

Releases

Packages

Languages

ekourlit/hdlrm-awkward

Folders and files

Latest commit

History

Repository files navigation

Preprocessing arbitrarily structured data for AI with Awkward Array

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages