Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print gauges in a critical section, using a single unit number #536

Merged
merged 3 commits into from
May 31, 2022

Conversation

rjleveque
Copy link
Member

I changed the print_gauges_and_reset_nextLoc function to print all gauge data using the same output file unit number OUTGAUGEUNIT rather than creating a unit number based on this base plus the OMP thread number. And I put this code in an OMP critical block so that only one thread can be doing this at a time.

The other approach allows writing in parallel, in principle, but recently I've been having seg faults when using lots of threads that seems related to this. It also seems potentially bad practice since we don't know what unit numbers will be used for writing gauges, and we use other unit number for other purposes, so there are potentially conflicts.

I don't think this will introduce any bottlenecks.

@mandli
Copy link
Member

mandli commented Apr 24, 2022

There's a simple fix for this that does not require us to serialize this process. Use a set number of unit numbers equal to the number of threads and reuse them. We would therefore create a pool of units that would be available as required but would not go above the number of threads available.

@rjleveque
Copy link
Member Author

@mandli: Thanks for reviewing. I think what you are suggesting is the way it was before I changed it: there was a unit number for each thread defined by

   myunit = OUTGAUGEUNIT + mythread

But still I was running into segmentation faults for some runs with 20 threads that seemed to be caused by this, and some people use many more threads so we have to declare all unit numbers above 89 off limits for other purposes (since OUTGAUGEUNIT=89).

Do you think this could be a bottleneck if it's serialized?

@mandli
Copy link
Member

mandli commented Apr 26, 2022

...and it still causes you issues. I guess I remember that you have mentioned this. It definitely would not be great to serialize this if we have a large number of gauges but we also buffer the output to mitigate the problem as well. Is this reproducible by chance?

In the end it is only serializing the output so maybe that's not so bad as long as we have buffers that avoid times when multiple gauges will be writing but I would expect that these are going to still overlap. For instance, if a bunch of gauges are in the same area. I would be better to understand why there is a seg-fault than anything.

@rjleveque
Copy link
Member Author

OK, we can hold off on merging this and I'll try to look into the seg fault issue some more.

rjleveque added a commit to rjleveque/geoclaw that referenced this pull request May 24, 2022
This part is still WIP, see discussion at clawpack#536
@mandli mandli merged commit 4de31f7 into clawpack:master May 31, 2022
bolliger32 added a commit to ClimateImpactLab/geoclaw that referenced this pull request May 31, 2022
* put print_gauges in critical block and use same output unit number for all

* change a comment about gauge critical section

* declare variable that should be integer

* support binary gauge output in gauges_module.f90

* added binary gauge output to tests/dtopo1

note that pyclaw.gauges changes are required to read binary file

* redo tests/dtopo1 so binary gauge data produced is compared with archived ascii data for portability

* update claw_git_status for tests/dtopo1

* revert code that prints gauges in critical section

This part is still WIP, see discussion at clawpack#536

* add support for binary32 gauge output

Co-authored-by: Randy LeVeque <[email protected]>
Co-authored-by: Kyle Mandli <[email protected]>
@mjberger
Copy link
Contributor

mjberger commented Oct 11, 2022 via email

@rjleveque
Copy link
Member Author

This change was reverted, but I just opened a new issue #543 to remind us to revisit this at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants