Simple chunked pedigree kinship #1282

timothymillar · 2025-01-07T01:43:05Z

Fixes Chunked pedigree kinship #1280
Fixes MemoryError: Allocation failed (probably too large) with ped.compute() #1213
Tests added
User visible changes (including notable bug fixes) are documented in changelog.rst

I've tested this with a real pedigree of ~55,000 individuals on a 4-core laptop in WSL2. I can calculate the pedigree matrix using chunks of 5000 samples and save the chunked matrix to a Zarr store in a total of < 25s using ~2.5GB of RAM. The full matrix would be ~22.5GB which exceeds the memory of this machine.

I've used the @jitclass experimental feature from Numba for a simple triangular matrix class. Using a triangular matrix halves the RAM needed for the intermediate matrices. It's not strictly necessary to use @jitclass for this but it allows for greater code reuse via custom __setitem__/__getitem__. If this is an issue it could be reworked to avoid @jitclass.

I've also removed the test runs with NUMBA_DISABLE_JIT: 1 because this introduces a dependency on @guvectorize and @jitclass in pedigree.py.

jeromekelleher

LGTM

timothymillar · 2025-01-12T19:48:02Z

Thanks @jeromekelleher, I assume we're not worried about the Cubed and Zarr 3 test runs failing for now?

jeromekelleher · 2025-01-13T09:39:34Z

I didn't see those @tomwhite, thoughts here?

tomwhite · 2025-01-13T09:46:25Z

I didn't see those @tomwhite, thoughts here?

They are not related to this PR, so OK to merge this if it's ready. I'll be looking at the Zarr 3 changes today.

jeromekelleher · 2025-01-13T09:48:03Z

Happy to merge when you are @timothymillar

timothymillar added 3 commits January 7, 2025 14:06

Allow chunking in pedigree_kinship sgkit-dev#1280

3320e12

Remove CI tests of pedigree functions without JIT

4e10560

Update changelog

dfacca8

jeromekelleher approved these changes Jan 9, 2025

View reviewed changes

timothymillar added the auto-merge Auto merge label for mergify test flight label Jan 13, 2025

mergify bot merged commit 5b96476 into sgkit-dev:main Jan 13, 2025
10 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple chunked pedigree kinship #1282

Simple chunked pedigree kinship #1282

timothymillar commented Jan 7, 2025 •

edited

Loading

jeromekelleher left a comment

timothymillar commented Jan 12, 2025

jeromekelleher commented Jan 13, 2025

tomwhite commented Jan 13, 2025

jeromekelleher commented Jan 13, 2025

Simple chunked pedigree kinship #1282

Simple chunked pedigree kinship #1282

Conversation

timothymillar commented Jan 7, 2025 • edited Loading

jeromekelleher left a comment

Choose a reason for hiding this comment

timothymillar commented Jan 12, 2025

jeromekelleher commented Jan 13, 2025

tomwhite commented Jan 13, 2025

jeromekelleher commented Jan 13, 2025

timothymillar commented Jan 7, 2025 •

edited

Loading