Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming add remfile #1761

Merged
merged 25 commits into from
Jan 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
6739cf3
refactor streaming tutorial to expose Python code
bendichter Aug 17, 2023
12ad86f
add fsspec to docs requirements
bendichter Aug 17, 2023
64e05d6
move fsspec to requirements-dev.txt
bendichter Aug 17, 2023
50ffa58
move fsspec to environment-ros3.yml
bendichter Aug 17, 2023
793526d
flake8
bendichter Aug 17, 2023
d824431
add required dependencies
bendichter Aug 17, 2023
4cad986
add remfile to streaming.py tutorial
bendichter Aug 17, 2023
5045815
adjust language
bendichter Aug 17, 2023
614375a
Merge branch 'dev' into streaming_add_remfile2
bendichter Aug 17, 2023
bab2f9e
move remfile to pip install
bendichter Aug 18, 2023
d5d0439
Merge branch 'dev' into streaming_add_remfile2
rly Oct 22, 2023
65c34aa
Update CHANGELOG.md
rly Oct 22, 2023
a942988
Merge branch 'dev' into streaming_add_remfile2
rly Nov 27, 2023
31c5f5f
* change tutorial to use correct h5py.File object
bendichter Nov 27, 2023
3a03c49
update CHANGELOG.md
bendichter Nov 27, 2023
77331a2
Update src/pynwb/__init__.py
bendichter Nov 27, 2023
ea72283
Update environment-ros3.yml
bendichter Nov 27, 2023
59fe561
Update docs/gallery/advanced_io/streaming.py
bendichter Nov 27, 2023
d34a526
Update docs/gallery/advanced_io/streaming.py
bendichter Nov 27, 2023
2a70ca9
Merge branch 'dev' into streaming_add_remfile2
bendichter Nov 27, 2023
b3591ed
Merge branch 'dev' into streaming_add_remfile2
bendichter Nov 29, 2023
bae8d30
Merge branch 'dev' into streaming_add_remfile2
rly Dec 2, 2023
7974499
Merge branch 'dev' into streaming_add_remfile2
rly Jan 13, 2024
c16736b
Update streaming.py
rly Jan 13, 2024
226f41d
Update streaming.py
rly Jan 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@
- Fix bug where namespaces were loaded in "w-" mode. @h-mayorquin [#1795](https://github.com/NeurodataWithoutBorders/pynwb/pull/1795)
- Fix bug where pynwb version was reported as "unknown" to readthedocs @stephprince [#1810](https://github.com/NeurodataWithoutBorders/pynwb/pull/1810)

### Documentation and tutorial enhancements
- Add RemFile to streaming tutorial @bendichter [#1761](https://github.com/NeurodataWithoutBorders/pynwb/pull/1761)

## PyNWB 2.5.0 (August 18, 2023)

### Enhancements and minor changes
Expand Down
35 changes: 33 additions & 2 deletions docs/gallery/advanced_io/streaming.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@
# `fsspec documentation on known implementations <https://filesystem-spec.readthedocs.io/en/latest/api.html?highlight=S3#other-known-implementations>`_
# for a full updated list of supported store formats.
#
# One downside of this fsspec method is that fsspec is not optimized for reading HDF5 files, and so streaming data
# using this method can be slow. A faster alternative is ``remfile`` described below.
#
# Streaming Method 2: ROS3
# ------------------------
# ROS3 stands for "read only S3" and is a driver created by the HDF5 Group that allows HDF5 to read HDF5 files stored
Expand Down Expand Up @@ -120,15 +123,43 @@
#
# pip uninstall h5py
# conda install -c conda-forge "h5py>=3.2"
#
# Besides the extra burden of installing h5py from a non-PyPI source, one downside of this ROS3 method is that
# this method does not support automatic retries in case the connection fails.


##################################################
# Method 3: remfile
# -----------------
# ``remfile`` is another library that enables indexing and streaming of files in s3. remfile is simple, fast, and
# allows for caching of data in the local filesystem. The caveats of ``remfile`` are that it is a very new project
# that has not been tested in a variety of use-cases and caching options are limited compared to ``fsspec``.
# You can install ``remfile`` with pip:
#
# .. code-block:: bash
#
# pip install remfile
#

import h5py
from pynwb import NWBHDF5IO
import remfile

rem_file = remfile.File(s3_url)

with h5py.File(rem_file, "r") as h5py_file:
with NWBHDF5IO(file=h5py_file, load_namespaces=True) as io:
nwbfile = io.read()
print(nwbfile.acquisition["lick_times"].time_series["lick_left_times"].data[:])

##################################################
# Which streaming method to choose?
# ---------------------------------
#
# From a user perspective, once opened, the :py:class:`~pynwb.file.NWBFile` works the same with
# both fsspec and ros3. However, in general, we currently recommend using fsspec for streaming
# NWB files because it is more performant and reliable than ros3. In particular fsspec:
# fsspec, ros3, or remfile. However, in general, we currently recommend using fsspec for streaming
# NWB files because it is more performant and reliable than ros3 and more widely tested than remfile.
# In particular, fsspec:
#
# 1. supports caching, which will dramatically speed up repeated requests for the
# same region of data,
Expand Down
3 changes: 3 additions & 0 deletions environment-ros3.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,6 @@ dependencies:
- fsspec==2023.6.0
- requests==2.28.1
- aiohttp==3.8.3
- pip
- pip:
- remfile==0.1.9