Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize var_inf #46

Closed
joselotl opened this issue Sep 7, 2023 · 2 comments · Fixed by #77
Closed

Parallelize var_inf #46

joselotl opened this issue Sep 7, 2023 · 2 comments · Fixed by #77
Assignees

Comments

@joselotl
Copy link
Contributor

joselotl commented Sep 7, 2023

This should be the last summarization algorithm to be parallelized as it needs all of the p(z) to be loaded in memory at once.

@joselotl joselotl self-assigned this Sep 7, 2023
@sschmidt23
Copy link
Collaborator

We should ask Markus Rau if there are any sampling shortcuts that we can use that work while loading only subsamples of the data in memory, he may know of some statistical tricks that make that possible (I hope so, as I don't see how we can load hundreds of millions to billions of galaxies at once).

@joselotl
Copy link
Contributor Author

joselotl commented Sep 7, 2023

It doesn't need to be loaded in the same node. My idea was to make sure that we have enough nodes to load all the p(z). My guess is that it will take around 10TB in total for the full catalog. It will mean to use around 20 CPU nodes from Perlmuter.

@joselotl joselotl linked a pull request Nov 16, 2023 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants