Skip to content

Meeting notes

Giorgis Georgakoudis edited this page Oct 1, 2024 · 49 revisions

10/01/2024

Agenda

  • Updates on tutorial slides
  • Demo video presentation
  • Presenters for the workshop and tutorial

Attendees

Tim, Todd, Giorgis

Minutes

5/14/2024 Attendees: Todd, Giorgis

Still trying to find time to work on conda packaging (Giorgis) and looking into differences between C and PyOMP for CFD (Todd).

4/9/2024 Attendees: Todd, Giorgis, Stuart, Tim

1) Just finished supercomputing paper.
2) Talked about using os.which to find llvm-config and then use that to find location of bin directories.
3) Talked about how to use conda-forge for building for all the platforms.  Would have to change packages names...too much trouble.
4) Could use the CI system that runs on different platforms to do the building.
5) Put our scripts in pyomp repo.  Even better would be to switch to using conda-build only and updating the build scripts in those conda recipes.
6) Still have problem with conda-build and llvm openmp-runtime build using the clang that was just built.

6/13/2023 Attendees: Todd A. Anderson, Giorgis, Stuart

1) Numba changes review.
   a) Tuple typing should be handled in Numba related to call convention.
   b) find_top_level_loops create patch
   c) uninstall_registry - probably try to be additive.  implement a new approach
   d) itercount - use intrinsic in function that looks inside range object  base 708
   e) optimize_final_module revert change
   f) run static constructors upstream
   g) add_intrinsics_openmp_pass - Andre's approach
   h) initialize_all_targets - move this to some openmp run once location
   i) parent_state addition to many compilation routines - Stuart will think about
   j) WithLifting - Todd will try in original location to see if it works now...if not then discussion
   k) excinfo should start with a period in generated LLVM
   l) enable_ssa - Todd will try without that option turned on
   m) cpointer arg types - Stuart will think on how to do this without major Numba replacement.

Ran out of time...to be continued next week.

6/6/2023 Attendees: Todd A. Anderson, Giorgis

1) Giorgis fixed the bug in new version of the code in two ways.
   a) Making var introduced by new openmp ir builder as private (previously thought it was shared but it isn't).
   b) Using custom outliner...will be the default approach.
2) TODOs
   a) Check target functionality on Windows.
   b) Do a diff of pyomp versus Numba and report on the changes.
   c) Do a diff of the llvmlite changes that pyomp we're carrying.
3) Ask Andre for time estimate on totally isolated LLVM passes.
4) Need more PyOMP documentation.

5/2/2023 Attendees: Todd A. Anderson, Stuart Archibald, Giorgis

1) Use "conda build config" file to get matrix of Python and numpy versions.
   - Python 3.8 and 3.9.  numpy 1.17 - 1.21      numba.readthedocs.io/en/stable/user/installing.html#version-support-information
2) cfunc_wrapper has right signature.
3) For next version:
   - Update to latest Numba.
   - Get rid of privatization.
4) Make sure that llvmdev conda build is building openmp runtime.
5) Talk with Stan about legalities of having a package named Numba in a different channel.
6) Make Python-for-HPC channel on anaconda?

4/25/2023 Attendees: Todd A. Anderson, Stuart Archibald, Siu Kwan Lam, Giorgis

1) More discussion of object lifetimes.

4/18/2023 Attendees: Todd A. Anderson, Giorgis Georgakoudis, Daniel Anderson

1) Giorgis ran a bunch of test with big-ugly directive.
   a) openmp for directive with reduction - fixed
   b) nested parallel for test_openmp.py::2000 - inner parallel for index variable omp_iv1 is shared instead of private. omp_ub1 is shared but should be firstprivate.
   c) pi_task - problem is unlisted vars should be firstpriavte for task but is becoming shared.
2) How to convey how to copy object to firstprivate.
3) Stuart mentioned decorator on test functions to run isolated in a test process.  Patch #8239.  Remove `@needs_subprocess` from Test class then add in #8239 
4) Giorgis will create a document discussing the options for firstprivate variables with respect to reference counting and we'll distribute it to Siu and ask him to attend a future meeting to discuss.

4/11/2023 Attendees: Todd A. Anderson, Giorgis Georgakoudis, Daniel Anderson

1) Todd gave update on target_data_nested test.  Had to create TypingAssign Numba IR node that does typing like a regular assignment but doesn't generate any executable code.  Added code for slice copying to the IR copying code.  Fixed the test so that all the arrays are integer arrays which gets rid of lowering error.  Now is giving error because the index variable inside the region is identified as an output and is therefore added to the signature when it shouldn't be.
2) Giorgis gave an update on the big ugly directive support.  Needed to add back in omp_lb code the way it was before to support code generation for the distribute clause.
3) We'll go with the current approach and do a release after big ugly directive and pi task are working.
4) For next release, we're going to try to get rid of variable privatization.  After the meeting, Todd had a thought that the STRUCT-based approach that we use for target map might not work for firstprivate on the CPU side.  You could try to copy an array firstprivate struct and then duplicate the data pointer but then array decref would all be operating on the same meminfo structure and is going to get really confused when the reference count tries to go negative by the number of openmp threads.

Notes between meetings 4/10/2023 1) After we get the big combined target directive and the pi_task example working then we'll do a release. 2) After that, we will make the LLVM pass into a plugin and then we can use Numba's llvmdev build and just have the LLVM pass plugin in llvmlite.

4/4/2023 Attendees: Todd A. Anderson, Tim Mattson, Giorgis Georgakoudis, Stuart Archibald, Daniel Anderson

1) Lots of discussion around difference between C and Python arrays and how those interact with implicit behavior in target regions.  We are going to see if at runtime openmp will generate an error if you have a target region where there is a map(tofrom:...) and the mapped array(s) already exist within the data environment from a previous target enter data directive.  The main proposal seems to be that if there is a tofrom generated for a target region implicitly by the pyomp frontend and the arrays have already been mapped then it is a no-op.  There is some question as to how to get this behavior whether with options: 1) modify openmp runtime (bad!), 2) have pyomp runtime that wraps the openmp runtime, or 3) just do the checks in code generation (most likely approach).
2) Todd to make "target teams loop" alias to "target teams distribute parallel for simd".

2/21/2023 Attendees: Todd A. Anderson, Stuart Archibald, Giorgis Georgakoudis

  1. Look at subTest in the tests directory for how to test device(0) and device(1) without code duplication.
  2. Todd gave update on changes.
  3. Giorgis to send all target examples to Todd who will add to his own and send to Daniel.
  4. Users try private first, then reductions, and if that can't work then fall back to shared vars with critical regions or atomics. Do we support atomics at this point?
  5. Daniel to see if caching works for openmp functions both for non-target and target.
  6. Is Intel generating an openmp target runtime for Intel GPUs?
  7. What is relationship between spirv uniform GPU backend and openmp target runtime?

for i in range(3): with self.subTest(f'somename {i} '): @njit def foo(): device = i foo.compile(()) foo.inspect_types()

2/9/2023 Attendees: Todd A. Anderson, Stuart Archibald, Giorgis Georgakoudis

  1. Make sure default DSA for teams is shared.
  2. Problem with changing a to arg.a so that sharing works correctly on GPU is likely that the code copy isn't deep enough and renaming in the outlined_ir effects the original ir.
  3. Todd needs to use minimal call convention on CPU-side as well.

2/2/2023 Attendees: Todd A. Anderson, Stuart Archibald, Giorgis Georgakoudis

  1. Giorgis - teams working on GPU side.
  2. Problem with num_teams in separate teams directive. (Todd fixed this.)
  3. Talked about openmp runtime calls. The Numba version we are using may be too old...the current code may work for a newer Numba version. In short term, we can use @lower on cuda/cudaimpl/registry.
  4. Giorgis will be getting the CPU side for target teams working then move on to distribute and parallel for.
  5. More discussion on whether the long-term solution is to outline every region so that we don't need to do variable renaming in Numba which leads to lots of problems.
  6. Todd will try to turn off the renaming and turn off Numba's constant value propagation. Can Stuart tell Todd where that is easily?
  7. Todd to create new target context with new minimal call convention that doesn't have the single parameter ret. That will be used for device(1) so that LLVM better handles the offloaded function.

1/26/2023 Attendees: Todd A. Anderson, Giorgis Georgakoudis, Stuart Archibald

  1. PyOMP will only support Python 3.8+ going forward from this point.
  2. Giorgis found problem with parallel for "extra code" detection in python 3.8. Todd will fix that and assume that only Python 3.8 needs to work.
  3. Giorgis found some issue with some numba functions like numba_gil_ensure not available in target code so he linked in numba's helperlib for the cpu target to get around that issue.
  4. Giorgis found some issues with separate parallel for inside target region and will send those code examples to Todd.
  5. There are issues with omp functions like omp_get_thread_num inside non-cpu target regions. We need an overload or the older separate typing and lowering methods for the cuda target for those functions as mentioned by Stuart with some code examples below.

[9:53 AM] stuart (Guest) function = types.ExternalFunction('omp_get_num_threads', types.intp())

[9:55 AM] stuart (Guest) @overload(omp_get_num_threads)def ol():   def impl():      return function()    return impl

1/12/2023 Attendees: Todd A. Anderson, Giorgis Georgakoudis, Stuart Archibald

  1. Todd is working on adding support for "target teams distribute parallel for" and "target teams distribute parallel for simd". Some initial version of that should be working in the next couple days.
  2. To represent those in the LLVM IR directives, we can 1) make the main directive as DIR.OMP.TARGET but then have QUAL.OMP.TEAMS, QUAL.OMP.DISTRIBUTE, QUAL.OMP.PARALLEL.LOOP, QUAL.OMP.SIMD or 2) create a new directive DIR.OMP.TARGET.TEAMS.DISTRIBUTE.PARALLEL.LOOP.SIMD. Giorgis is fine with either for now and #2 seems to match the structure of the code better at the moment so that is what we are going forward with.
  3. Todd is extracting the loop processing code that does the bottom increment and handles last private so that we can use the same code for "for", "parallel for" and all target variants that include "parallel for".
  4. We decided that for now Giorgis would try to do the code manipulation to add the outer chunking loop necessary to implement "distribute".
  5. Todd mentioned that he would try to get the Intel product team that would eventually maintain this in open source to engage early and as a first step add their Numba spirv backend and compile the outlined target functions using that backend to allow us to target multiple accelerator architectures.
  6. Stuart mentioned Tardis supernova and stumpy as two projects that might be interested in engaging with our target work early.

12/5/2023 Attendes: Todd, Giorgis, Tim Mattson

  1. Giorgis created Linux docket containers for armv8: one that is a development environment and another that is a Jupyter notebook. (armv8 on Macs and Nvidia Grace).
  2. We need additional containers for x64 with GPU support.
  3. Giorgis did internal linkage for omp globals and that fixed the linking issue. Todd to go back to cfd application now that that bug is fixed.
  4. Discussion of available hardware for doing pyomp demo with GPU support.
  5. Tim: "SC paper would be great" but its competitive. Tutorial there would also be nice but again very competitive. SciPy another possibility. eScience another possible venue. Plan on SC paper for April.
  6. We know that caching is broken for GPU codes.
Clone this wiki locally