Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple chunked pedigree kinship #1282

Merged
merged 3 commits into from
Jan 13, 2025

Conversation

timothymillar
Copy link
Collaborator

@timothymillar timothymillar commented Jan 7, 2025

I've tested this with a real pedigree of ~55,000 individuals on a 4-core laptop in WSL2. I can calculate the pedigree matrix using chunks of 5000 samples and save the chunked matrix to a Zarr store in a total of < 25s using ~2.5GB of RAM. The full matrix would be ~22.5GB which exceeds the memory of this machine.

I've used the @jitclass experimental feature from Numba for a simple triangular matrix class. Using a triangular matrix halves the RAM needed for the intermediate matrices. It's not strictly necessary to use @jitclass for this but it allows for greater code reuse via custom __setitem__/__getitem__. If this is an issue it could be reworked to avoid @jitclass.

I've also removed the test runs with NUMBA_DISABLE_JIT: 1 because this introduces a dependency on @guvectorize and @jitclass in pedigree.py.

Copy link
Collaborator

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@timothymillar
Copy link
Collaborator Author

Thanks @jeromekelleher, I assume we're not worried about the Cubed and Zarr 3 test runs failing for now?

@jeromekelleher
Copy link
Collaborator

I didn't see those @tomwhite, thoughts here?

@tomwhite
Copy link
Collaborator

I didn't see those @tomwhite, thoughts here?

They are not related to this PR, so OK to merge this if it's ready. I'll be looking at the Zarr 3 changes today.

@jeromekelleher
Copy link
Collaborator

Happy to merge when you are @timothymillar

@timothymillar timothymillar added the auto-merge Auto merge label for mergify test flight label Jan 13, 2025
@mergify mergify bot merged commit 5b96476 into sgkit-dev:main Jan 13, 2025
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Auto merge label for mergify test flight
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Chunked pedigree kinship MemoryError: Allocation failed (probably too large) with ped.compute()
3 participants