Skip to content

Commit

Permalink
Merge branch 'main' into olex/update-benchmarks
Browse files Browse the repository at this point in the history
  • Loading branch information
olejandro authored Feb 15, 2024
2 parents e179e62 + 9aa7001 commit 90458f9
Show file tree
Hide file tree
Showing 15 changed files with 513 additions and 166 deletions.
36 changes: 31 additions & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ name: CI
on:
# Triggers the workflow on push or pull request events but only for the main branch
push:
branches: [main]
branches: [ main ]
pull_request:
branches: [main]
branches: [ main ]

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
Expand Down Expand Up @@ -34,6 +34,12 @@ jobs:
pre-commit install
pre-commit run --all-files
- name: Run unit tests
working-directory: xl2times
run: |
source .venv/bin/activate
pytest
# ---------- Prepare ETSAP Demo models

- uses: actions/checkout@v3
Expand Down Expand Up @@ -69,6 +75,9 @@ jobs:
# ---------- Install GAMS

- name: Install GAMS
env:
GAMS_LICENSE: ${{ secrets.GAMS_LICENSE }}
if: ${{ env.GAMS_LICENSE != '' }}
run: |
curl https://d37drm4t2jghv5.cloudfront.net/distributions/44.1.0/linux/linux_x64_64_sfx.exe -o linux_x64_64_sfx.exe
chmod +x linux_x64_64_sfx.exe
Expand All @@ -81,17 +90,18 @@ jobs:
mkdir -p $HOME/.local/share/GAMS
echo "$GAMS_LICENSE" > $HOME/.local/share/GAMS/gamslice.txt
ls -l $HOME/.local/share/GAMS/
env:
GAMS_LICENSE: ${{ secrets.GAMS_LICENSE }}
# ---------- Run tool, check for regressions

- name: Run tool on all benchmarks
env:
GAMS_LICENSE: ${{ secrets.GAMS_LICENSE }}
if: ${{ env.GAMS_LICENSE != '' }}
working-directory: xl2times
# Use tee to also save the output to out.txt so that the summary table can be
# printed again in the next step.
# Save the return code to retcode.txt so that the next step can fail the action
# if run_benchmarks.py failed.
run: |
source .venv/bin/activate
export PATH=$PATH:$GITHUB_WORKSPACE/GAMS/gams44.1_linux_x64_64_sfx
Expand All @@ -101,6 +111,22 @@ jobs:
| tee out.txt; \
echo ${PIPESTATUS[0]} > retcode.txt)
- name: Run CSV-only regression tests (no GAMS license)
env:
GAMS_LICENSE: ${{ secrets.GAMS_LICENSE }}
if: ${{ env.GAMS_LICENSE == '' }}
working-directory: xl2times
# Run without --dd flag if GAMS license secret doesn't exist.
# Useful for testing for (CSV) regressions in forks before creating PRs.
run: |
source .venv/bin/activate
export PATH=$PATH:$GITHUB_WORKSPACE/GAMS/gams44.1_linux_x64_64_sfx
(python utils/run_benchmarks.py benchmarks.yml \
--times_dir $GITHUB_WORKSPACE/TIMES_model \
--verbose \
| tee out.txt; \
echo ${PIPESTATUS[0]} > retcode.txt)
- name: Print summary
working-directory: xl2times
run: |
Expand Down
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,11 @@ ground_truth/*
*.pyproj.*
speedscope.json
*.pkl
.venv/
.venv*/
benchmarks/
.idea/
.python-version
docs/_build/
docs/api/
.coverage
/out.txt
40 changes: 40 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,45 @@ git commit --no-verify

See our GitHub Actions CI `.github/workflows/ci.yml` and the utility script `utils/run_benchmarks.py` to see how to run the tool on the DemoS models.

In short, use the commands below to clone the benchmarks data into your local `benchmarks` dir.
Note that this assumes you have access to all these repositories (some are private and
you'll have to request access) - if not, comment out the inaccessible benchmarks from `benchmakrs.yml` before running.

```bash
mkdir benchmarks
# Get VEDA example models and reference DD files
# XLSX files are in private repo for licensing reasons, please request access or replace with your own licensed VEDA example files.
git clone [email protected]:olejandro/demos-xlsx.git benchmarks/xlsx/
git clone [email protected]:olejandro/demos-dd.git benchmarks/dd/

# Get Ireland model and reference DD files
git clone [email protected]:esma-cgep/tim.git benchmarks/xlsx/Ireland
git clone [email protected]:esma-cgep/tim-gams.git benchmarks/dd/Ireland
```
Then to run the benchmarks:
```bash
# Run a only a single benchmark by name (see benchmarks.yml for name list)
python utils/run_benchmarks.py benchmarks.yml --verbose --run DemoS_001-all | tee out.txt

# Run all benchmarks (without GAMS run, just comparing CSV data)
python utils/run_benchmarks.py benchmarks.yml --verbose | tee out.txt


# Run benchmarks with regression tests vs main branch
git branch feature/your_new_changes --checkout
# ... make your code changes here ...
git commit -a -m "your commit message" # code must be committed for comparison to `main` branch to run.
python utils/run_benchmarks.py benchmarks.yml --verbose | tee out.txt
```
At this point, if you haven't broken anything you should see something like:
```
Change in runtime: +2.97s
Change in correct rows: +0
Change in additional rows: +0
No regressions. You're awesome!
```
If you have a large increase in runtime, a decrease in correct rows or fewer rows being produced, then you've broken something and will need to figure out how to fix it.

### Debugging Regressions

If your change is causing regressions on one of the benchmarks, a useful way to debug and find the difference is to run the tool in verbose mode and compare the intermediate tables. For example, if your branch has regressions on Demo 1:
Expand All @@ -97,6 +136,7 @@ python -m build
python -m twine upload dist/*
```


## Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a
Expand Down
32 changes: 23 additions & 9 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,20 +14,28 @@ requires-python = ">=3.10"
license = { file = "LICENSE" }
keywords = []
classifiers = [
"Development Status :: 4 - Beta",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Development Status :: 4 - Beta",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
]
dependencies = [
"GitPython >= 3.1.31, < 3.2",
"more-itertools",
"openpyxl >= 3.0, < 3.1",
"pandas >= 2.1",
"GitPython >= 3.1.31, < 3.2",
"more-itertools",
"openpyxl >= 3.0, < 3.1",
"pandas >= 2.1",
"pyarrow",
"tqdm",
]

[project.optional-dependencies]
dev = ["black", "pre-commit", "tabulate"]
dev = [
"black",
"pre-commit",
"tabulate",
"pytest",
"pytest-cov"
]

[project.urls]
Documentation = "https://github.com/etsap-TIMES/xl2times#readme"
Expand All @@ -36,3 +44,9 @@ Source = "https://github.com/etsap-TIMES/xl2times"

[project.scripts]
xl2times = "xl2times.__main__:main"

[tool.pytest.ini_options]
# don't print runtime warnings
filterwarnings = ["ignore::DeprecationWarning", "ignore::UserWarning", "ignore::FutureWarning"]
# show output, print test coverage report
addopts = '-s --durations=0 --durations-min=5.0 --tb=native --cov-report term --cov-report html --cov=xl2times --cov=utils'
Binary file added tests/data/austimes_pcg_test_data.parquet
Binary file not shown.
Binary file added tests/data/comm_groups_austimes_test_data.parquet
Binary file not shown.
67 changes: 67 additions & 0 deletions tests/test_transforms.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
from datetime import datetime

import pandas as pd

from xl2times import transforms
from xl2times.transforms import (
_process_comm_groups_vectorised,
_count_comm_group_vectorised,
)

pd.set_option(
"display.max_rows",
20,
"display.max_columns",
20,
"display.width",
300,
"display.max_colwidth",
75,
"display.precision",
3,
)


class TestTransforms:
def test_generate_commodity_groups(self):
"""
Tests that the _count_comm_group_vectorised function works as expected.
Full austimes run:
Vectorised version took 0.021999 seconds
looped version took 966.653371 seconds
43958x speedup
"""
# data extracted immediately before the original for loops
comm_groups = pd.read_parquet(
"tests/data/comm_groups_austimes_test_data.parquet"
).drop(columns=["commoditygroup"])

# filter data so test runs faster
comm_groups = comm_groups.query("region in ['ACT', 'NSW']")

comm_groups2 = comm_groups.copy()
_count_comm_group_vectorised(comm_groups2)
assert comm_groups2.drop(columns=["commoditygroup"]).equals(comm_groups)
assert comm_groups2.shape == (comm_groups.shape[0], comm_groups.shape[1] + 1)

def test_default_pcg_vectorised(self):
"""Tests the default primary commodity group identification logic runs correctly.
Full austimes run:
Looped version took 1107.66 seconds
Vectorised version took 62.85 seconds
"""

# data extracted immediately before the original for loops
comm_groups = pd.read_parquet("tests/data/austimes_pcg_test_data.parquet")

comm_groups = comm_groups[(comm_groups["region"].isin(["ACT", "NT"]))]
comm_groups2 = _process_comm_groups_vectorised(
comm_groups.copy(), transforms.csets_ordered_for_pcg
)
assert comm_groups2 is not None and not comm_groups2.empty
assert comm_groups2.shape == (comm_groups.shape[0], comm_groups.shape[1] + 1)
assert comm_groups2.drop(columns=["DefaultVedaPCG"]).equals(comm_groups)


if __name__ == "__main__":
TestTransforms().test_default_pcg_vectorised()
Empty file added utils/__init__.py
Empty file.
9 changes: 7 additions & 2 deletions utils/dd_to_csv.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import argparse
import sys
from collections import defaultdict
import json
import os
Expand Down Expand Up @@ -216,13 +217,17 @@ def convert_dd_to_tabular(
return


if __name__ == "__main__":
def main(arg_list: None | list[str] = None):
args_parser = argparse.ArgumentParser()
args_parser.add_argument(
"input_dir", type=str, help="Input directory containing .dd files."
)
args_parser.add_argument(
"output_dir", type=str, help="Output directory to save the .csv files in."
)
args = args_parser.parse_args()
args = args_parser.parse_args(arg_list)
convert_dd_to_tabular(args.input_dir, args.output_dir, generate_headers_by_attr())


if __name__ == "__main__":
main(sys.argv[1:])
Loading

0 comments on commit 90458f9

Please sign in to comment.