Prepare for modcon-contest submission. (#1)

* Prepare for submission. * Update readme * Cleanup code * Add unit tests * Add makefile, and github actions workflow * Add github actions badge
guidorice · Nov 25, 2023 · c2ea655 · c2ea655
1 parent 8129167
commit c2ea655
Show file tree

Hide file tree

Showing 23 changed files with 842 additions and 2 deletions.
diff --git a/.github/workflows/test.yaml b/.github/workflows/test.yaml
@@ -0,0 +1,50 @@
+name: Run Tests
+
+on: 
+  pull_request:
+    types: [opened, reopened, edited]
+
+  push:
+    branches:
+      - 'main'
+
+jobs:
+
+  tests:
+    runs-on: ubuntu-latest
+
+    defaults:
+      run:
+        shell: bash -el {0}
+
+    steps:
+      - name: Check out repository code
+        uses: actions/checkout@v4
+        with:
+          submodules: "recursive"
+
+      - name: "Setup conda env (base)"
+        uses: conda-incubator/setup-miniconda@v2
+        with:
+          python-version: 3.11
+          auto-activate-base: true
+
+      - name: "Install mojo"
+        run: |
+          curl https://get.modular.com | MODULAR_AUTH=${{ secrets.MODULAR_AUTH }} sh -
+          modular auth ${{ secrets.MODULAR_AUTH }}
+          modular install mojo
+
+      - name: "Setup conda env (modcon23-contest)"
+        uses: conda-incubator/setup-miniconda@v2
+        with:
+          python-version: 3.11
+          activate-environment: modcon23-contest
+          environment-file: environment.yml
+
+      - name: "Run mojo-pytest"
+        run: |
+          export MODULAR_HOME="/home/runner/.modular"
+          export PATH="/home/runner/.modular/pkg/packages.modular.com_mojo/bin:$PATH"
+          export MOJO_PYTHON_LIBRARY="$(find $CONDA_PREFIX/lib -iname 'libpython*.[s,d]*' | sort -r | head -n 1)"
+          make test
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,7 @@
+.vscode/
+.DS_Store
+scratch/
+
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]

diff --git a/Makefile b/Makefile
@@ -0,0 +1,29 @@
+.PHONY: test benchmark-mojo install-py-packages
+
+install-py-packages:
+	conda env create -p venv --file environment.yml
+
+test:
+	pytest -W error
+
+benchmark-mojo:
+	mojo mojo_impl/naive.mojo 100
+	mojo mojo_impl/naive.mojo 1000
+	mojo mojo_impl/naive.mojo 10000
+	mojo mojo_impl/naive.mojo 100000
+	mojo mojo_impl/naive.mojo 1000000
+	mojo mojo_impl/naive.mojo 10000000
+
+	mojo mojo_impl/optimized_a.mojo 100
+	mojo mojo_impl/optimized_a.mojo 1000
+	mojo mojo_impl/optimized_a.mojo 10000
+	mojo mojo_impl/optimized_a.mojo 100000
+	mojo mojo_impl/optimized_a.mojo 1000000
+	mojo mojo_impl/optimized_a.mojo 10000000
+
+	mojo mojo_impl/optimized_b.mojo 100
+	mojo mojo_impl/optimized_b.mojo 1000
+	mojo mojo_impl/optimized_b.mojo 10000
+	mojo mojo_impl/optimized_b.mojo 100000
+	mojo mojo_impl/optimized_b.mojo 1000000
+	mojo mojo_impl/optimized_b.mojo 10000000
diff --git a/README.md b/README.md
@@ -1,2 +1,86 @@
-# modcon23-contest
-modcon23-contest
+# Spatial envelope optimization and benchmarks
+
+A [Mojo](https://github.com/modularml/mojo)🔥 project calculating the spatial envelope, and exploring the
+performance of Python, NumPy, and Mojo.
+
+[![Run Tests](https://github.com/guidorice/modcon23-contest/actions/workflows/tests.yaml/badge.svg)](https://github.com/guidorice/modcon23-contest/actions/workflows/tests.yaml)
+
+## Envelope
+
+Calculating an envelope is a fundamental part of spatial analysis. The envelope
+(aka: bounds, bbox, mbr) is usually defined by an xmin, ymin, xmax, and ymax
+representing the minimum and maximum x (longitude) and y (latitude) coordinates
+that encompass the bounded feature(s).
+
+![bounding box](./docs/img/bounding_box.png)
+
+Figure attribution: [QGIS documentation](https://docs.qgis.org/3.28/en/docs/user_manual/processing_algs/qgis/vectorgeometry.html#bounding-boxes): Fig. 27.53 Black lines represent the bounding boxes of each polygon feature.
+
+## Variants benchmarked
+
+- [naïve Python](./py_impl/naive.py)
+- [naïve Mojo](./mojo_impl/naive.mojo)
+- [Python using NumPy (well-optimized C code)](./py_impl/optimized_numpy.py)
+- [Mojo optimized with vectorization and loop unrolling, single-threaded (mojo optimized "a")](./mojo_impl/optimized_a.mojo)
+- [Mojo optimized with parallelization, vectorization and loop unrolling. (mojo optimized "b")](./mojo_impl/optimized_b.mojo)
+
+## Variants also considered
+
+I wanted to benchmark the [Shapely](https://shapely.readthedocs.io/en/stable/)
+package which wraps the [GEOS library](https://libgeos.org/), another
+well-optimized C/C++ codebase. However Shapely seems to cache the envelope upon
+geometry creation, so it was not feasible to benchmark the envelope
+calculations separately from geometry constructors. Using NumPy seemed like a
+good alternative.
+
+## All benchmarks
+
+Test system: mojo `0.5.0` on Apple M2, 24GB RAM. Data type: `float32`.
+
+![overall benchmarks](./docs/img/benchmarks-1.png)
+
+## Chart of optimized variants only
+
+![optimized benchmarks](./docs/img/benchmarks-2.png)
+
+## Key Findings
+
+1. [Mojo optimized "a"](./mojo_impl/optimized_a.mojo) is the best overall
+performer. However for large feature spaces (millions of points) we can get
+at least an additional 25% speedup by using one thread per dimension, as shown in
+[Mojo optimized "b"](./mojo_impl/optimized_b.mojo).
+
+2. Even more performance optimizations are possibly left on the table, such as
+auto-tuning, stack allocation, and tiled/striped memory access. A fusion of
+Mojo optimized "a" and "b" could offer the best performance across all feature
+sizes.
+
+3. In addition to being performance winners, the Mojo variants are
+parameterized by the number of dimensions (`dims`) and by data type (`dtype`).
+In other words, the same generic code can run, for example, `float16`,
+`float64` or with 3, 4 or more dimensions. In GIS systems the number of
+dimensions is sometimes referred to as XY, XYZ, or XYZM, where Z is "height",
+and M is "measure".
+
+## Example output from Mojo's `benchmark` module
+
+```text
+$ mojo mojo_impl/optimized_a.mojo 100
+float32 100
+---------------------
+Benchmark Report (s)
+---------------------
+Mean: 6.5356175103213615e-07
+Total: 0.75083200000000005
+Iters: 1148831
+Warmup Mean: 9.9999999999999995e-07
+Warmup Total: 1.9999999999999999e-06
+Warmup Iters: 2
+Fastest Mean: 6.4460000000000004e-07
+Slowest Mean: 7.9999999999999996e-07
+
+ns: 653.56175103213616
+microsecs: 0.6535617510321361
+ms: 0.0006535617510321361
+s: 6.5356175103213615e-07
+```
diff --git a/docs/benchmark-results.ods b/docs/benchmark-results.ods
diff --git a/docs/img/benchmarks-1.png b/docs/img/benchmarks-1.png
diff --git a/docs/img/benchmarks-2.png b/docs/img/benchmarks-2.png
diff --git a/docs/img/bounding_box.png b/docs/img/bounding_box.png
diff --git a/environment.yml b/environment.yml
@@ -0,0 +1,10 @@
+name: modcon23-contest
+channels:
+  - defaults
+dependencies:
+  - python=3.11
+  - ipykernel
+  - numpy
+  - pip
+  - pip:
+     - git+https://github.com/guidorice/[email protected]
diff --git a/mojo_impl/__init__.mojo b/mojo_impl/__init__.mojo
diff --git a/mojo_impl/naive.mojo b/mojo_impl/naive.mojo
@@ -0,0 +1,66 @@
+import benchmark
+from math.limit import inf, neginf
+from random import rand
+from sys import argv
+from tensor import Tensor, TensorSpec
+from utils.index import Index
+
+fn envelope[dtype: DType, dims: Int](tensor: Tensor[dtype]) -> SIMD[dtype, 2 * dims]:
+    """
+    Calculate envelope: iterative, plain mojo code. Uses static types and a stdlib numeric container type (Tensor).
+    """
+
+    @parameter
+    constrained[dims > 0 and dims % 2 == 0, "power-of-two dims only"]()
+
+    let NegInf = neginf[dtype]()
+    let Inf = inf[dtype]()
+    let num_features = tensor.shape()[1]
+    var result = SIMD[dtype, 2 * dims]()
+
+    for d in range(dims):
+        result[d] = Inf
+
+    for d in range(dims, 2 * dims):
+        result[d] = NegInf
+
+    for y in range(dims):
+        for x in range(num_features):
+            let val = tensor[Index(y, x)]
+            if val < result[y]:
+                result[y] = val
+            if val > result[dims + y]:
+                result[dims + y] = val
+
+    return result
+
+alias dtype = DType.float32
+alias dims = 2
+
+fn main() raises:
+    """
+    Usage: `mojo naive.mojo {n}` , where n is an integer number of features.
+    """
+
+    # read number of features
+    let width = atol(argv()[1])
+
+    # create a tensor, filled with random values
+    print(dtype, width)
+    let spec = TensorSpec(dtype, dims, width)
+    let tensor = rand[dtype](spec)
+
+    # run bechmark module
+    @parameter
+    fn envelope_worker():
+        _ = envelope[dtype, dims](tensor)
+
+    let mojo_report = benchmark.run[envelope_worker]()
+    mojo_report.print()
+    let secs = mojo_report.mean["s"]()
+    let ns = mojo_report.mean["ns"]()
+    let ms = mojo_report.mean["ms"]()
+    print("ns:", ns)
+    print("microsecs:", secs * 10 ** 6)
+    print("ms:", ms)
+    print("s:", secs)
diff --git a/mojo_impl/optimized_a.mojo b/mojo_impl/optimized_a.mojo
@@ -0,0 +1,79 @@
+import benchmark
+from algorithm import vectorize
+from math.limit import inf, neginf
+from random import rand
+from sys import argv
+from sys.info import simdbitwidth
+from tensor import Tensor, TensorSpec
+from utils.index import Index
+
+alias nelts = simdbitwidth()
+
+fn envelope[dtype: DType, dims: Int](tensor: Tensor[dtype]) -> SIMD[dtype, dims * 2]:
+    """
+    Calculate envelope: vectorized, unrolled, single-threaded.
+    """
+
+    @parameter
+    constrained[dims > 0 and dims % 2 == 0, "power-of-two dims only"]()
+
+    let NegInf = neginf[dtype]()
+    let Inf = inf[dtype]()
+    let num_features = tensor.shape()[1]
+    var result = SIMD[dtype, dims * 2]()
+
+    @unroll
+    for d in range(dims):
+        result[d] = Inf
+
+    @unroll
+    for d in range(dims, 2 * dims):
+        result[d] = NegInf
+
+    @unroll
+    for dim in range(dims):
+        @parameter
+        fn min_max_simd[simd_width: Int](feature_idx: Int):
+            let index = Index(dim, feature_idx)
+            let vals = tensor.simd_load[simd_width](index)
+            let min = vals.reduce_min()
+            if min < result[dim]:
+                result[dim] = min
+            let max = vals.reduce_max()
+            if max > result[dims + dim]:
+                result[dims + dim] = max
+        vectorize[nelts, min_max_simd](num_features)
+
+    return result
+
+
+alias dtype = DType.float32
+alias dims = 2
+
+
+fn main() raises:
+    """
+    Usage: `mojo optimized_a.mojo {n}` , where n is an integer number of features.
+    """
+    # read number of features
+    let width = atol(argv()[1])
+
+    # create a tensor, filled with random values
+    print(dtype, width)
+    let spec = TensorSpec(dtype, dims, width)
+    let tensor = rand[dtype](spec)
+
+    # run bechmark module
+    @parameter
+    fn envelope_worker():
+        _ = envelope[dtype, dims](tensor)
+
+    let mojo_report = benchmark.run[envelope_worker]()
+    mojo_report.print()
+    let secs = mojo_report.mean["s"]()
+    let ns = mojo_report.mean["ns"]()
+    let ms = mojo_report.mean["ms"]()
+    print("ns:", ns)
+    print("microsecs:", secs * 10 ** 6)
+    print("ms:", ms)
+    print("s:", secs)