Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
changelog.rst
I've tested this with a real pedigree of ~55,000 individuals on a 4-core laptop in WSL2. I can calculate the pedigree matrix using chunks of 5000 samples and save the chunked matrix to a Zarr store in a total of < 25s using ~2.5GB of RAM. The full matrix would be ~22.5GB which exceeds the memory of this machine.
I've used the
@jitclass
experimental feature from Numba for a simple triangular matrix class. Using a triangular matrix halves the RAM needed for the intermediate matrices. It's not strictly necessary to use@jitclass
for this but it allows for greater code reuse via custom__setitem__
/__getitem__
. If this is an issue it could be reworked to avoid@jitclass
.I've also removed the test runs with
NUMBA_DISABLE_JIT: 1
because this introduces a dependency on@guvectorize
and@jitclass
inpedigree.py
.