BlueBrain · iomaganaris · Jan 2, 2023 · Jun 4, 2021 · Jun 4, 2021 · Jun 6, 2021
diff --git a/.gitignore b/.gitignore
@@ -49,3 +49,19 @@ venv.bak/
 .cmake-format.yaml
 .pre-commit-config.yaml
 .ipynb_checkpoints
+
+# Benchmark outputs
+test/benchmark/*.ll
+test/benchmark/*.ptx
+test/benchmark/*.out
+test/benchmark/*.log
+test/benchmark/*.cpp
+test/benchmark/*.txt
+test/benchmark/core.*
+test/benchmark/memory_bound_*
+test/benchmark/memory-bound_*
+test/benchmark/hh_*
+test/benchmark/compute_bound_*
+test/benchmark/compute-bound_*
+test/benchmark/llvm_benchmark_*
+test/benchmark/v*
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -82,7 +82,7 @@ add_compile_options(${NMODL_COMPILER_WARNING_SUPPRESSIONS})
 # =============================================================================
 project(
   NMODL
-  VERSION ${NMODL_GIT_LAST_TAG}
+  VERSION "1.0"
   LANGUAGES CXX)
 
 # =============================================================================

diff --git a/docker/docker-compose.yml b/docker/docker-compose.yml
diff --git a/docker/recipe/Dockerfile b/docker/recipe/Dockerfile
diff --git a/docker/recipe/entrypoint b/docker/recipe/entrypoint
diff --git a/docs/CC2023/PAD.md b/docs/CC2023/PAD.md
@@ -0,0 +1,51 @@
+
+# MOD2IR: High-Performance Code Generation for a Biophysically Detailed Neuronal Simulation DSL
+
+## Preliminary Artifact Description
+
+### Broad Description
+
+This artifact provides all the necessary code, scripts and results to compile the NMODL transpiler
+with the MOD2IR extension and run all benchmarks described in the manuscript. To simplify the
+evaluation process we provide along with the instructions a Dockerfile that will setup a viable
+system for the benchmarks. The driver script compiles the membrane mechanism model `hh.mod` and the
+synapse mechanism model `expsyn.mod` with various compile-time configurations and then runs the
+generated  binaries comparing their runtimes. More specifically the benchmark compares the execution
+runtime of the binaries generated via the two-step compilation process MOD-C++-binary using various
+open-source and commercial compiler frameworks with the one-step ahead-of-time and just-in-time
+processes of MOD2IR.
+MOD2IR is implemented as a code generation backend inside the NMODL Framework and it makes heavy
+use of the LLVM IR and compilation passes. Most of the relevant code of the described work can be
+found [here](https://github.com/BlueBrain/nmodl/tree/llvm/src/codegen/llvm) and
+[here](https://github.com/BlueBrain/nmodl/tree/llvm/test/benchmark). The intructions to reproduce
+the results can be found [here](https://github.com/BlueBrain/nmodl/blob/llvm/docs/CC2023/README.md).
+
+### Badge
+
+Blue Badge (results validated). We hope that using the provided Dockerfile and scripts the
+evaluators should be able to fully build our code and reproduce our benchmark setup as well as
+obtain benchmarking results. Please note that in all likelihood the obtained runtimes by the
+evaluators will slightly differ from the presented results in the paper as they heavily depend on
+the used hardware and system software. We believe, however, that the results should nevertheless be
+qualitiatively the same as the ones we have presented.
+
+### Hardware requisites
+
+The provided artifact can in theory be run on any x86 hardware platform. For the prupose of closely
+reproducing our benchmark results it is required a workstation (or cloud instance) with Intel Xeon
+Skylake (or newer) CPU that supports AVX-512 instructions and an NVIDIA Volta V100 (or newer) GPU.
+All benchmark runs are single-core and have relatively low memory-requirement. For building or running
+the Docker image (and more specifically the NMODL Framework) we, however, recommend a system with plenty
+of cores, at least 32GB of RAM available and 20 GB of disk space.
+
+### Software requisites
+
+Any reasonably up-to-date Linux system with Docker should be sufficient. If GPU results are to be
+reproduced, an up-to-date CUDA (11.0 or newer) should be present.
+
+
+### Expectations
+
+We expect that all setup and benchmarks can be completed within one working day. The expected time for
+building or pulling the docker image is around 10 minutes using a modern multicore system with a stable
+internet connection. The expected runtime of the benchmarks is around 5 hours.
diff --git a/docs/CC2023/README.md b/docs/CC2023/README.md
@@ -0,0 +1,75 @@
+
+# MOD2IR: High-Performance Code Generation for a Biophysically Detailed Neuronal Simulation DSL
+
+Please refer to the PAD.md file for an overview and necessary prerequisites.
+
+## Benchmarking Instructions
+
+To reproduce as closely as possible our environment and to lower the burden of the
+installation of the different compilers and libraries we have created Docker images which take
+care of installing all the necessary packages and compilers to install MOD2IR and execute the
+benchmarks.
+Due to technical restrictions imposed by Docker to execute a Docker image and be able to execute
+applications on NVIDIA GPUs there are some extra steps needed. For this reason we have created two
+different `Dockerfile`s, one that takes care of both the CPU and GPU benchmarks and one for CPU only
+execution if there is no NVIDIA GPU available in the test system.
+
+### CPU and GPU docker image
+
+The image that targets both CPU and GPU can be found in `test/benchmark/gpu_docker/Dockerfile`.
+To launch the Docker image you can execute the following:
+
+```
+git clone -b llvm https://github.com/BlueBrain/nmodl.git
+cd nmodl/test/benchnark/gpu_docker  # Enter the directory that contains the Dockerfile (based on Ubuntu 22.04 but with small changes in should be supported by any Ubuntu version or other linux distributions)
+bash install_gpu_docker_env.sh  # Installs docker and NVIDIA docker runtime (needs sudo permission)
+docker run -it -v $PWD:/opt/mount --gpus all bluebrain/nmodl:mod2ir-gpu-benchmark # Execute docker image (~16GB)
+```
+
+After building and launching the docker file we can now execute the benchmarks and generate the same
+plots as the ones we included in the paper with the new results along the reference plots from the paper.
+To do this we need to execute the following two scripts inside the docker image environment:
+
+```
+cd nmodl/test/benchmark  # Enter the directory where the scripts are inside the docker image
+bash run_benchmark_script_cpu_gpu.sh  # Runs all the benchmarks on CPU and GPU
+python3 plot_benchmarks_cpu_gpu.py  # Generate the plots based on the outputs of the previous script
+cp -r graphs_output_pandas /opt/mount  # Copy the graphs from the docker image to your environment
+```
+
+Executing `run_benchmark_script_dockerfile.sh` will generate two pickle files that include the results
+in `hh_expsyn_cpu/benchmark_results.pickle` for the CPU benchmarks and `hh_expsyn_gpu/benchmark_results.pickle`
+for the GPU benchmarks. Those will then be loaded by `plot_benchmarks.py` to generate the plots.
+Now you can exit the docker image terminal and open the above files which exist in your local directory.
+
+
+### CPU only docker image
+
+In case there is no GPU available instead of running the above Docker container you can also run a
+CPU only container.
+To do this you need to:
+
+```
+cd test/benchnark/cpu_docker  # Enter the directory that contains the Dockerfile
+docker run -it -v $PWD:/opt/mount bluebrain/nmodl:mod2ir-cpu-benchmark # Execute docker image (~16GB)
+```
+
+Then inside the docker shell:
+
+```
+cd nmodl/test/benchmark  # Enter the directory where the scripts are inside the docker image
+bash run_benchmark_script_cpu_only.sh  # Runs all the benchmarks on CPU
+python3 plot_benchmarks_cpu_only.py  # Generate the plots based on the outputs of the previous script
+cp -r graphs_output_pandas /opt/mount  # Copy the graphs from the docker image to your environment
+```
+
+By executing `run_benchmark_script_cpu_only.sh` there will be only `hh_expsyn_cpu/benchmark_results.pickle`
+generated containing the CPU results.
+
+
+## Notes
+
+1. Acceleration results with `GCC` compiler might be better in the docker container than the paper
+   due to the newer OS we're using in the Dockerfile. Latest Ubuntu versions come with GLIBC 2.3x that
+   includes `libmvec` which provides vectorized implementations to the `GCC` compiler enabling the
+   vectorization of the kernels even without providing the `SVML` library to `GCC`.
diff --git a/src/codegen/codegen_acc_visitor.cpp b/src/codegen/codegen_acc_visitor.cpp
@@ -57,7 +57,7 @@ void CodegenAccVisitor::print_channel_iteration_block_parallel_hint(BlockType ty
 
 
 void CodegenAccVisitor::print_atomic_reduction_pragma() {
-    if (!info.artificial_cell) {
+    if (info.point_process) {
         printer->add_line("nrn_pragma_acc(atomic update)");
         printer->add_line("nrn_pragma_omp(atomic update)");
     }

diff --git a/src/codegen/llvm/codegen_llvm_visitor.cpp b/src/codegen/llvm/codegen_llvm_visitor.cpp
@@ -474,7 +474,7 @@ void CodegenLLVMVisitor::visit_codegen_atomic_statement(const ast::CodegenAtomic
     const auto& member_node = codegen_intance_node->get_member_var();
     const auto& member_name = member_node->get_node_name();
 
-    // Sanity checks. Not that there is a bit of duplication with `read_from_or_write_to_instance`
+    // Sanity checks. Note that there is a bit of duplication with `read_from_or_write_to_instance`
     // but this is not crucial for now.
     // TODO: remove this duplication!
     if (!instance_var_helper.is_an_instance_variable(member_name))

diff --git a/src/main.cpp b/src/main.cpp
@@ -66,6 +66,9 @@ int main(int argc, const char* argv[]) {
     /// the number of repeated experiments for the benchmarking
     int num_experiments = 100;
 
+    /// benchmark external kernel with JIT
+    std::string external_kernel_library;
+
     /// X dimension of grid in blocks for GPU execution
     int llvm_cuda_grid_dim_x = 1;
 
@@ -268,6 +271,9 @@ int main(int argc, const char* argv[]) {
     benchmark_opt->add_option("--repeat",
                               num_experiments,
                               fmt::format("Number of experiments for benchmarking ({})", num_experiments))->ignore_case();
+    benchmark_opt->add_option("--external",
+                              external_kernel_library,
+                              fmt::format("Benchmark external kernels from shared library({})", external_kernel_library))->ignore_case()->check(CLI::ExistingFile);
     benchmark_opt->add_option("--grid-dim-x",
                               llvm_cuda_grid_dim_x,
                               fmt::format("Grid dimension X ({})", llvm_cuda_grid_dim_x))->ignore_case();
@@ -447,6 +453,7 @@ int main(int argc, const char* argv[]) {
                                                        platform,
                                                        cfg.llvm_opt_level_ir,
                                                        cfg.llvm_opt_level_codegen,
+                                                       external_kernel_library,
                                                        gpu_execution_parameters);
                     benchmark.run();
                 }

diff --git a/src/pybind/pynmodl.cpp b/src/pybind/pynmodl.cpp
@@ -188,26 +188,31 @@ class JitDriver {
     }
 
 
-    benchmark::BenchmarkResults run(std::shared_ptr<nmodl::ast::Program> node,
+    benchmark::BenchmarkResults run(const std::shared_ptr<const nmodl::ast::Program> node,
                                     std::string& modname,
                                     int num_experiments,
                                     int instance_size,
+                                    std::string& external_kernel_library,
                                     int cuda_grid_dim_x,
                                     int cuda_block_dim_x) {
         // New directory is needed to be created otherwise the directory cannot be created
         // automatically through python
         if (cfg.nmodl_ast || cfg.json_ast || cfg.json_perfstat) {
             utils::make_path(cfg.scratch_dir);
         }
-        cg_driver.prepare_mod(node, modname);
+        utils::make_path(cfg.output_dir);
+        // Make copy of node to be able to run the visitors according to any changes in the
+        // configuration and execute the mechanisms' functions multiple times
+        auto new_node = std::make_shared<nmodl::ast::Program>(*node);
+        cg_driver.prepare_mod(new_node, modname);
         nmodl::codegen::CodegenLLVMVisitor visitor(modname,
                                                    cfg.output_dir,
                                                    platform,
                                                    0,
                                                    !cfg.llvm_no_debug,
                                                    cfg.llvm_fast_math_flags,
                                                    true);
-        visitor.visit_program(*node);
+        visitor.visit_program(*new_node);
         const GPUExecutionParameters gpu_execution_parameters{cuda_grid_dim_x, cuda_block_dim_x};
         nmodl::benchmark::LLVMBenchmark benchmark(visitor,
                                                   modname,
@@ -218,6 +223,7 @@ class JitDriver {
                                                   platform,
                                                   cfg.llvm_opt_level_ir,
                                                   cfg.llvm_opt_level_codegen,
+                                                  external_kernel_library,
                                                   gpu_execution_parameters);
         return benchmark.run();
     }
@@ -313,8 +319,9 @@ PYBIND11_MODULE(_nmodl, m_nmodl) {
              &nmodl::JitDriver::run,
              "node"_a,
              "modname"_a,
-             "num_experiments"_a,
              "instance_size"_a,
+             "num_experiments"_a = 1,
+             "external_kernel_library"_a = "",
              "cuda_grid_dim_x"_a = 1,
              "cuda_block_dim_x"_a = 1);
 #else

diff --git a/test/benchmark/CMakeLists.txt b/test/benchmark/CMakeLists.txt
@@ -24,6 +24,13 @@ if(NMODL_ENABLE_JIT_EVENT_LISTENERS)
   target_compile_definitions(llvm_benchmark PUBLIC NMODL_HAVE_JIT_EVENT_LISTENERS)
 endif()
 
+# =============================================================================
+# external kernel stub
+# =============================================================================
+add_library(extkernel SHARED ext_kernel.cpp)
+set_target_properties(extkernel PROPERTIES CXX_VISIBILITY_PRESET default)
+target_link_libraries(llvm_benchmark PUBLIC extkernel)
+
 # =============================================================================
 # LLVM pyjit
 # =============================================================================