Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace bigfile mpi #71

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Conversation

qezlou
Copy link
Collaborator

@qezlou qezlou commented Nov 1, 2024

Not Ready yet!

I have written bunch of tests in tests/test_mpi_io.py (it shouldn't pass on GitHub yet because I don't have snapshot test here). The tests pass when running one of the ASTRID snapshots, but the spectra I get do not match what I would get with bigfile. I noticed even when I set MPI_SIZE=1, we don't get similar result, So:

  • There is a stupid bug I cannot think of
  • The snapshot data is not recorded as we think

Screenshot 2024-11-01 at 10 04 14 AM

ToDO:

  • I have checked the data we get, i.e. AbstractSnapshot.get_data() from this code and bigfile, they size of the output is ~ 1% larger when using bigfile. So, I'm going to work on this a bit more to see what's going on.

qezlou and others added 16 commits September 23, 2024 14:58
- From mpi4py API and not using bigfile API at all.
- All custom codes are integrated in `abstractsnapshot.py` as new functions and in `get_data()`
- Each rank read the data in segments using `mpi_file_handler.Read_at()`
- So each bigfile blob should not be accessed by more than on rank as long as `comm.Get_size < num of blob files`, for astrid it is ~ 1300.
- Test modules re added in `test` directory which runs on a copy of a bigfile `header` in `example_bigfile`
- As to why we use COMM_SELF not COMM_WORLD in `MPI.File.Open()`
Since it is Bcast communication, it is better to laod the needed block headers first. Otherwise, we would run into deadlock issue on rank != root
- prelaod all the needed block headers
- some tests to check the particle load continuity across segments and ranks
- This test can still only be run on a real snapshot not the example uploaded here
- Use non MPI file handle if `mpi4py` is not installed
- Even in non-mpi mode, we can't match the data read by bigfile. So either :
  - There is an stupid bug I cannot think of
  - The snpashot data is not recorded as we think
@qezlou qezlou requested a review from sbird November 1, 2024 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant