Performance Bottleneck in Signal Extraction from Large MDF Files #1108

mbrettsc · 2024-12-05T12:49:16Z

Python version

('python=3.12.7 | packaged by Anaconda, Inc. | (main, Oct 4 2024, 13:17:27) '
'[MSC v.1929 64 bit (AMD64)]')
'os=Windows-11-10.0.22631-SP0'
'numpy=1.26.4'
'asammdf=8.0.1'

Code

MDF version

4.10

Code snippet

with MDF(file_path, load_measured_data=False) as mdf_file:
        mdf_file_info = mdf_file.info()
        groups = list(mdf_file.filter(signals_to_extract).iter_groups())

Description

I work with large MDF files and need to extract several signals from them. As the number of signals increases, the execution time grows rapidly. This performance bottleneck becomes problematic with large-scale data processing.

What are the best practices for handling such scenarios efficiently, considering the growing size of the files and the increasing number of signals?

Best regards,
Martin

danielhrisca · 2024-12-05T13:04:27Z

Hello Martin,

load_measured_data=False has no effect

regarding the processing speed:

I've pushed a commit yesterday that slightly improves the reading speed (~10%)
make sure that you have the library isal installed (https://pypi.org/project/isal/) as it provides faster zlib compression/decompression

I would also like to have better performance but at the moment this is the best I could do :(

mbrettsc · 2024-12-06T08:47:20Z

Hi Daniel,

Thank you for your prompt response.

Is the commit you mentioned in the dev branch?

In addition to the package-based recommendation, do you think there’s potential to implement multiprocessing or a similar approach in the extraction process? For instance, extracting a single signal is relatively fast, but could we extract multiple signals concurrently to enhance performance?

Also I wanted to mention, your package is absolutely amazing :)

danielhrisca · 2024-12-10T11:00:45Z

@mbrettsc please try out this wheels and see if you notice some performance improvements
https://github.com/danielhrisca/asammdf/actions/runs/12254274879

danielhrisca · 2024-12-12T15:52:20Z

@mbrettsc please the new wheels found here https://github.com/danielhrisca/asammdf/actions/runs/12299840252

On my PC processing a large files when from ~50MB/s to ~130MB/s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Bottleneck in Signal Extraction from Large MDF Files #1108

Performance Bottleneck in Signal Extraction from Large MDF Files #1108

mbrettsc commented Dec 5, 2024 •

edited

Loading

danielhrisca commented Dec 5, 2024

mbrettsc commented Dec 6, 2024 •

edited

Loading

danielhrisca commented Dec 10, 2024

danielhrisca commented Dec 12, 2024

Performance Bottleneck in Signal Extraction from Large MDF Files #1108

Performance Bottleneck in Signal Extraction from Large MDF Files #1108

Comments

mbrettsc commented Dec 5, 2024 • edited Loading

Python version

Code

MDF version

Code snippet

Description

danielhrisca commented Dec 5, 2024

mbrettsc commented Dec 6, 2024 • edited Loading

danielhrisca commented Dec 10, 2024

danielhrisca commented Dec 12, 2024

mbrettsc commented Dec 5, 2024 •

edited

Loading

mbrettsc commented Dec 6, 2024 •

edited

Loading