Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare for modcon-contest submission. #1

Merged
merged 21 commits into from
Nov 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: Run Tests

on:
pull_request:
types: [opened, reopened, edited]

push:
branches:
- 'main'

jobs:

tests:
runs-on: ubuntu-latest

defaults:
run:
shell: bash -el {0}

steps:
- name: Check out repository code
uses: actions/checkout@v4
with:
submodules: "recursive"

- name: "Setup conda env (base)"
uses: conda-incubator/setup-miniconda@v2
with:
python-version: 3.11
auto-activate-base: true

- name: "Install mojo"
run: |
curl https://get.modular.com | MODULAR_AUTH=${{ secrets.MODULAR_AUTH }} sh -
modular auth ${{ secrets.MODULAR_AUTH }}
modular install mojo

- name: "Setup conda env (modcon23-contest)"
uses: conda-incubator/setup-miniconda@v2
with:
python-version: 3.11
activate-environment: modcon23-contest
environment-file: environment.yml

- name: "Run mojo-pytest"
run: |
export MODULAR_HOME="/home/runner/.modular"
export PATH="/home/runner/.modular/pkg/packages.modular.com_mojo/bin:$PATH"
export MOJO_PYTHON_LIBRARY="$(find $CONDA_PREFIX/lib -iname 'libpython*.[s,d]*' | sort -r | head -n 1)"
make test
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
.vscode/
.DS_Store
scratch/

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
29 changes: 29 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
.PHONY: test benchmark-mojo install-py-packages

install-py-packages:
conda env create -p venv --file environment.yml

test:
pytest -W error

benchmark-mojo:
mojo mojo_impl/naive.mojo 100
mojo mojo_impl/naive.mojo 1000
mojo mojo_impl/naive.mojo 10000
mojo mojo_impl/naive.mojo 100000
mojo mojo_impl/naive.mojo 1000000
mojo mojo_impl/naive.mojo 10000000

mojo mojo_impl/optimized_a.mojo 100
mojo mojo_impl/optimized_a.mojo 1000
mojo mojo_impl/optimized_a.mojo 10000
mojo mojo_impl/optimized_a.mojo 100000
mojo mojo_impl/optimized_a.mojo 1000000
mojo mojo_impl/optimized_a.mojo 10000000

mojo mojo_impl/optimized_b.mojo 100
mojo mojo_impl/optimized_b.mojo 1000
mojo mojo_impl/optimized_b.mojo 10000
mojo mojo_impl/optimized_b.mojo 100000
mojo mojo_impl/optimized_b.mojo 1000000
mojo mojo_impl/optimized_b.mojo 10000000
88 changes: 86 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,86 @@
# modcon23-contest
modcon23-contest
# Spatial envelope optimization and benchmarks

A [Mojo](https://github.com/modularml/mojo)🔥 project calculating the spatial envelope, and exploring the
performance of Python, NumPy, and Mojo.

[![Run Tests](https://github.com/guidorice/modcon23-contest/actions/workflows/tests.yaml/badge.svg)](https://github.com/guidorice/modcon23-contest/actions/workflows/tests.yaml)

## Envelope

Calculating an envelope is a fundamental part of spatial analysis. The envelope
(aka: bounds, bbox, mbr) is usually defined by an xmin, ymin, xmax, and ymax
representing the minimum and maximum x (longitude) and y (latitude) coordinates
that encompass the bounded feature(s).

![bounding box](./docs/img/bounding_box.png)

Figure attribution: [QGIS documentation](https://docs.qgis.org/3.28/en/docs/user_manual/processing_algs/qgis/vectorgeometry.html#bounding-boxes): Fig. 27.53 Black lines represent the bounding boxes of each polygon feature.

## Variants benchmarked

- [naïve Python](./py_impl/naive.py)
- [naïve Mojo](./mojo_impl/naive.mojo)
- [Python using NumPy (well-optimized C code)](./py_impl/optimized_numpy.py)
- [Mojo optimized with vectorization and loop unrolling, single-threaded (mojo optimized "a")](./mojo_impl/optimized_a.mojo)
- [Mojo optimized with parallelization, vectorization and loop unrolling. (mojo optimized "b")](./mojo_impl/optimized_b.mojo)

## Variants also considered

I wanted to benchmark the [Shapely](https://shapely.readthedocs.io/en/stable/)
package which wraps the [GEOS library](https://libgeos.org/), another
well-optimized C/C++ codebase. However Shapely seems to cache the envelope upon
geometry creation, so it was not feasible to benchmark the envelope
calculations separately from geometry constructors. Using NumPy seemed like a
good alternative.

## All benchmarks

Test system: mojo `0.5.0` on Apple M2, 24GB RAM. Data type: `float32`.

![overall benchmarks](./docs/img/benchmarks-1.png)

## Chart of optimized variants only

![optimized benchmarks](./docs/img/benchmarks-2.png)

## Key Findings

1. [Mojo optimized "a"](./mojo_impl/optimized_a.mojo) is the best overall
performer. However for large feature spaces (millions of points) we can get
at least an additional 25% speedup by using one thread per dimension, as shown in
[Mojo optimized "b"](./mojo_impl/optimized_b.mojo).

2. Even more performance optimizations are possibly left on the table, such as
auto-tuning, stack allocation, and tiled/striped memory access. A fusion of
Mojo optimized "a" and "b" could offer the best performance across all feature
sizes.

3. In addition to being performance winners, the Mojo variants are
parameterized by the number of dimensions (`dims`) and by data type (`dtype`).
In other words, the same generic code can run, for example, `float16`,
`float64` or with 3, 4 or more dimensions. In GIS systems the number of
dimensions is sometimes referred to as XY, XYZ, or XYZM, where Z is "height",
and M is "measure".

## Example output from Mojo's `benchmark` module

```text
$ mojo mojo_impl/optimized_a.mojo 100
float32 100
---------------------
Benchmark Report (s)
---------------------
Mean: 6.5356175103213615e-07
Total: 0.75083200000000005
Iters: 1148831
Warmup Mean: 9.9999999999999995e-07
Warmup Total: 1.9999999999999999e-06
Warmup Iters: 2
Fastest Mean: 6.4460000000000004e-07
Slowest Mean: 7.9999999999999996e-07

ns: 653.56175103213616
microsecs: 0.6535617510321361
ms: 0.0006535617510321361
s: 6.5356175103213615e-07
```
Binary file added docs/benchmark-results.ods
Binary file not shown.
Binary file added docs/img/benchmarks-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/benchmarks-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/bounding_box.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 10 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
name: modcon23-contest
channels:
- defaults
dependencies:
- python=3.11
- ipykernel
- numpy
- pip
- pip:
- git+https://github.com/guidorice/[email protected]
Empty file added mojo_impl/__init__.mojo
Empty file.
66 changes: 66 additions & 0 deletions mojo_impl/naive.mojo
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
import benchmark
from math.limit import inf, neginf
from random import rand
from sys import argv
from tensor import Tensor, TensorSpec
from utils.index import Index

fn envelope[dtype: DType, dims: Int](tensor: Tensor[dtype]) -> SIMD[dtype, 2 * dims]:
"""
Calculate envelope: iterative, plain mojo code. Uses static types and a stdlib numeric container type (Tensor).
"""

@parameter
constrained[dims > 0 and dims % 2 == 0, "power-of-two dims only"]()

let NegInf = neginf[dtype]()
let Inf = inf[dtype]()
let num_features = tensor.shape()[1]
var result = SIMD[dtype, 2 * dims]()

for d in range(dims):
result[d] = Inf

for d in range(dims, 2 * dims):
result[d] = NegInf

for y in range(dims):
for x in range(num_features):
let val = tensor[Index(y, x)]
if val < result[y]:
result[y] = val
if val > result[dims + y]:
result[dims + y] = val

return result

alias dtype = DType.float32
alias dims = 2

fn main() raises:
"""
Usage: `mojo naive.mojo {n}` , where n is an integer number of features.
"""

# read number of features
let width = atol(argv()[1])

# create a tensor, filled with random values
print(dtype, width)
let spec = TensorSpec(dtype, dims, width)
let tensor = rand[dtype](spec)

# run bechmark module
@parameter
fn envelope_worker():
_ = envelope[dtype, dims](tensor)

let mojo_report = benchmark.run[envelope_worker]()
mojo_report.print()
let secs = mojo_report.mean["s"]()
let ns = mojo_report.mean["ns"]()
let ms = mojo_report.mean["ms"]()
print("ns:", ns)
print("microsecs:", secs * 10 ** 6)
print("ms:", ms)
print("s:", secs)
79 changes: 79 additions & 0 deletions mojo_impl/optimized_a.mojo
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
import benchmark
from algorithm import vectorize
from math.limit import inf, neginf
from random import rand
from sys import argv
from sys.info import simdbitwidth
from tensor import Tensor, TensorSpec
from utils.index import Index

alias nelts = simdbitwidth()

fn envelope[dtype: DType, dims: Int](tensor: Tensor[dtype]) -> SIMD[dtype, dims * 2]:
"""
Calculate envelope: vectorized, unrolled, single-threaded.
"""

@parameter
constrained[dims > 0 and dims % 2 == 0, "power-of-two dims only"]()

let NegInf = neginf[dtype]()
let Inf = inf[dtype]()
let num_features = tensor.shape()[1]
var result = SIMD[dtype, dims * 2]()

@unroll
for d in range(dims):
result[d] = Inf

@unroll
for d in range(dims, 2 * dims):
result[d] = NegInf

@unroll
for dim in range(dims):
@parameter
fn min_max_simd[simd_width: Int](feature_idx: Int):
let index = Index(dim, feature_idx)
let vals = tensor.simd_load[simd_width](index)
let min = vals.reduce_min()
if min < result[dim]:
result[dim] = min
let max = vals.reduce_max()
if max > result[dims + dim]:
result[dims + dim] = max
vectorize[nelts, min_max_simd](num_features)

return result


alias dtype = DType.float32
alias dims = 2


fn main() raises:
"""
Usage: `mojo optimized_a.mojo {n}` , where n is an integer number of features.
"""
# read number of features
let width = atol(argv()[1])

# create a tensor, filled with random values
print(dtype, width)
let spec = TensorSpec(dtype, dims, width)
let tensor = rand[dtype](spec)

# run bechmark module
@parameter
fn envelope_worker():
_ = envelope[dtype, dims](tensor)

let mojo_report = benchmark.run[envelope_worker]()
mojo_report.print()
let secs = mojo_report.mean["s"]()
let ns = mojo_report.mean["ns"]()
let ms = mojo_report.mean["ms"]()
print("ns:", ns)
print("microsecs:", secs * 10 ** 6)
print("ms:", ms)
print("s:", secs)
Loading