Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming add remfile #1761

Merged
merged 25 commits into from
Jan 13, 2024
Merged

Streaming add remfile #1761

merged 25 commits into from
Jan 13, 2024

Conversation

bendichter
Copy link
Contributor

@bendichter bendichter commented Aug 17, 2023

Motivation

remfile is a new package that has some advantages for streaming NWB HDF5 files from S3, so I want to document it as an option.

Depends on hdmf-dev/hdmf#946

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running flake8 from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

@codecov
Copy link

codecov bot commented Aug 17, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (1529028) 91.99% compared to head (226f41d) 83.68%.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1761      +/-   ##
==========================================
- Coverage   91.99%   83.68%   -8.32%     
==========================================
  Files          27       27              
  Lines        2623     2623              
  Branches      685      685              
==========================================
- Hits         2413     2195     -218     
- Misses        138      344     +206     
- Partials       72       84      +12     
Flag Coverage Δ
integration ?
unit 83.68% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@CodyCBakerPhD CodyCBakerPhD linked an issue Aug 30, 2023 that may be closed by this pull request
3 tasks
rly
rly previously approved these changes Oct 22, 2023
@rly rly added this to the Next Release milestone Oct 22, 2023
@rly
Copy link
Contributor

rly commented Nov 27, 2023

@bendichter can we move this out of draft mode?

src/pynwb/__init__.py Outdated Show resolved Hide resolved
* rmv addition of RemFile as an allowed type for NWBHDF5IO
CHANGELOG.md Outdated Show resolved Hide resolved
src/pynwb/__init__.py Outdated Show resolved Hide resolved
@bendichter bendichter marked this pull request as ready for review November 27, 2023 15:37
@bendichter
Copy link
Contributor Author

@rly ok, ready for review

@bendichter bendichter requested a review from rly November 27, 2023 15:38
@rly
Copy link
Contributor

rly commented Nov 27, 2023

Thanks. Could we make remfile Option 2, or even Option 1 on the streaming docs? See some related discussion #1791

environment-ros3.yml Outdated Show resolved Hide resolved
@rly rly enabled auto-merge (squash) January 13, 2024 08:26
@rly rly merged commit 51a113f into dev Jan 13, 2024
22 of 23 checks passed
@rly rly deleted the streaming_add_remfile2 branch January 13, 2024 08:30
@magland
Copy link
Contributor

magland commented Jan 13, 2024

Thanks @rly !

Here's a very rough benchmark I put together to compare timings of fsspec and remfile

https://github.com/scratchrealm/pynwb_streaming_benchmark

Pasting a snapshot of the readme:

The script main.py was run repeatedly in two settings: On dandihub and on a laptop in a home network.
The results are in results_dandihub.txt and results_home_network.txt, respectively.

Below is a summary of the average timings. My assessment is that remfile appears to be faster for initial load time while fsspec method appears to be faster for reading a 30 second sample of ephys data. On the home network, the fsspec method appears to be substantially slower than the remfile method for the initial load time.

Some limitations:

  • Only applied to a single NWB file
  • Only ran a limited number of trials
  • Only ran on a single home network
  • Didn't test ros3 method yet

On dandihub

Average Initial Load Time:

  • fsspec: 7.74 seconds
  • remfile: 6.14 seconds

Average 30sec Sample Read Time:

  • fsspec: 7.90 seconds
  • remfile: 8.20 seconds ​

On Jeremy's Home Network

Average Initial Load Time:

  • fsspec: 37.23 seconds
  • remfile: 9.42 seconds

Average 30sec Sample Read Time:

  • fsspec: 47.61 seconds
  • remfile: 56.95 seconds ​

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Documentation]: RemFile
3 participants