Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open groups lazily #157

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

bicycle1885
Copy link
Member

Currently, ZGroup opens a group immediately and recursively, which causes performance issues, especially when the group contains a large number of subarrays or subgroups. For example, in my benchmark, a single zopen call to open a group with 100k subarrays took 13 seconds, while a group with 1m subarrays took 266 seconds. This is a cost we want to avoid when we are only interested in a subset of those subarrays.

Zarr.jl/src/ZGroup.jl

Lines 17 to 33 in e3ccc9d

function ZGroup(s::T,mode="r",path="";fill_as_missing=false) where T <: AbstractStore
arrays = Dict{String, ZArray}()
groups = Dict{String, ZGroup}()
for d in subdirs(s,path)
dshort = split(d,'/')[end]
m = zopen_noerr(s,mode,path=_concatpath(path,dshort),fill_as_missing=fill_as_missing)
if isa(m, ZArray)
arrays[dshort] = m
elseif isa(m, ZGroup)
groups[dshort] = m
end
end
attrs = getattrs(s,path)
startswith(path,"/") && error("Paths should never start with a leading '/'")
ZGroup(s, path, arrays, groups, attrs,mode=="w")
end

This PR introduces lazy group opening, meaning zopen now opens only the target group without recursively opening subarrays or subgroups. Subarrays and subgroups are opened on demand through getindex/setindex!. This change significantly improves performance in the scenarios mentioned above.

@coveralls
Copy link

coveralls commented Aug 22, 2024

Pull Request Test Coverage Report for Build 10508663377

Details

  • 9 of 13 (69.23%) changed or added relevant lines in 1 file are covered.
  • 9 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.8%) to 88.149%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/ZGroup.jl 9 13 69.23%
Files with Coverage Reduction New Missed Lines %
src/Storage/gcstore.jl 4 64.79%
src/Storage/consolidated.jl 5 73.08%
Totals Coverage Status
Change from base Build 10492654928: -0.8%
Covered Lines: 781
Relevant Lines: 886

💛 - Coveralls

@meggart
Copy link
Collaborator

meggart commented Aug 22, 2024

Ok, having so many arrays in a Group is indeed problematic when eagerly opening the whole structure recursively. However, there should still be a way for users in a session to interactively explore the arrays and groups contained in a Zarr Group, as I see it there would be no way to achieve this for users given the current PR.

Also, I am a bit concerned about performance implications for the use cases with only a few arrays in the hierarchy. With this PR, this means that for every getindex on a group the metadata would need to be parsed again. I think this should not be a big issue in many cases, but would like to check a few of my downstream applications if they are affected in some way.

@bicycle1885
Copy link
Member Author

Regarding the first point, I can enhance the show method to allow users to view subarrays and subgroups within the current group.

For the second point, I believe the issue won’t be problematic since the user can reuse the object returned by the initial getindex call. While this might require some adjustments to the user’s code, it is manageable on the user’s side. On the other hand, the slowness of zopen that I mentioned here cannot be easily circumvented by the user.

@bicycle1885
Copy link
Member Author

I added a commit to show subarrays/groups up to 10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants