Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Bottleneck in Signal Extraction from Large MDF Files #1108

Open
mbrettsc opened this issue Dec 5, 2024 · 4 comments
Open

Performance Bottleneck in Signal Extraction from Large MDF Files #1108

mbrettsc opened this issue Dec 5, 2024 · 4 comments

Comments

@mbrettsc
Copy link

mbrettsc commented Dec 5, 2024

Python version

('python=3.12.7 | packaged by Anaconda, Inc. | (main, Oct 4 2024, 13:17:27) '
'[MSC v.1929 64 bit (AMD64)]')
'os=Windows-11-10.0.22631-SP0'
'numpy=1.26.4'
'asammdf=8.0.1'

Code

MDF version

4.10

Code snippet

with MDF(file_path, load_measured_data=False) as mdf_file:
        mdf_file_info = mdf_file.info()
        groups = list(mdf_file.filter(signals_to_extract).iter_groups())

Description

I work with large MDF files and need to extract several signals from them. As the number of signals increases, the execution time grows rapidly. This performance bottleneck becomes problematic with large-scale data processing.

What are the best practices for handling such scenarios efficiently, considering the growing size of the files and the increasing number of signals?

Best regards,
Martin

@danielhrisca
Copy link
Owner

Hello Martin,

load_measured_data=False has no effect

regarding the processing speed:

  • I've pushed a commit yesterday that slightly improves the reading speed (~10%)
  • make sure that you have the library isal installed (https://pypi.org/project/isal/) as it provides faster zlib compression/decompression

I would also like to have better performance but at the moment this is the best I could do :(

@mbrettsc
Copy link
Author

mbrettsc commented Dec 6, 2024

Hi Daniel,

Thank you for your prompt response.

Is the commit you mentioned in the dev branch?

In addition to the package-based recommendation, do you think there’s potential to implement multiprocessing or a similar approach in the extraction process? For instance, extracting a single signal is relatively fast, but could we extract multiple signals concurrently to enhance performance?

Also I wanted to mention, your package is absolutely amazing :)

@danielhrisca
Copy link
Owner

@mbrettsc please try out this wheels and see if you notice some performance improvements
https://github.com/danielhrisca/asammdf/actions/runs/12254274879

@danielhrisca
Copy link
Owner

@mbrettsc please the new wheels found here https://github.com/danielhrisca/asammdf/actions/runs/12299840252

On my PC processing a large files when from ~50MB/s to ~130MB/s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants