Merge branch 'main' into olex/update-benchmarks

etsap-TIMES · Feb 15, 2024 · 90458f9 · 90458f9
2 parents e179e62 + 9aa7001
commit 90458f9
Show file tree

Hide file tree

Showing 15 changed files with 513 additions and 166 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -3,9 +3,9 @@ name: CI
 on:
   # Triggers the workflow on push or pull request events but only for the main branch
   push:
-    branches: [main]
+    branches: [ main ]
   pull_request:
-    branches: [main]
+    branches: [ main ]
 
   # Allows you to run this workflow manually from the Actions tab
   workflow_dispatch:
@@ -34,6 +34,12 @@ jobs:
           pre-commit install
           pre-commit run --all-files
 
+      - name: Run unit tests
+        working-directory: xl2times
+        run: |
+          source .venv/bin/activate
+          pytest
+
       # ---------- Prepare ETSAP Demo models
 
       - uses: actions/checkout@v3
@@ -69,6 +75,9 @@ jobs:
       # ---------- Install GAMS
 
       - name: Install GAMS
+        env:
+          GAMS_LICENSE: ${{ secrets.GAMS_LICENSE }}
+        if: ${{ env.GAMS_LICENSE != '' }}
         run: |
           curl https://d37drm4t2jghv5.cloudfront.net/distributions/44.1.0/linux/linux_x64_64_sfx.exe -o linux_x64_64_sfx.exe
           chmod +x linux_x64_64_sfx.exe
@@ -81,17 +90,18 @@ jobs:
           mkdir -p $HOME/.local/share/GAMS
           echo "$GAMS_LICENSE" > $HOME/.local/share/GAMS/gamslice.txt
           ls -l $HOME/.local/share/GAMS/
-        env:
-          GAMS_LICENSE: ${{ secrets.GAMS_LICENSE }}
+
 
       # ---------- Run tool, check for regressions
 
       - name: Run tool on all benchmarks
+        env:
+          GAMS_LICENSE: ${{ secrets.GAMS_LICENSE }}
+        if: ${{ env.GAMS_LICENSE != '' }}
         working-directory: xl2times
         # Use tee to also save the output to out.txt so that the summary table can be
         # printed again in the next step.
         # Save the return code to retcode.txt so that the next step can fail the action
-        # if run_benchmarks.py failed.
         run: |
           source .venv/bin/activate
           export PATH=$PATH:$GITHUB_WORKSPACE/GAMS/gams44.1_linux_x64_64_sfx
@@ -101,6 +111,22 @@ jobs:
               | tee out.txt; \
             echo ${PIPESTATUS[0]} > retcode.txt)
 
+      - name: Run CSV-only regression tests (no GAMS license)
+        env:
+          GAMS_LICENSE: ${{ secrets.GAMS_LICENSE }}
+        if: ${{ env.GAMS_LICENSE == '' }}
+        working-directory: xl2times
+        # Run without --dd flag if GAMS license secret doesn't exist.
+        # Useful for testing for (CSV) regressions in forks before creating PRs.
+        run: |
+          source .venv/bin/activate
+          export PATH=$PATH:$GITHUB_WORKSPACE/GAMS/gams44.1_linux_x64_64_sfx
+          (python utils/run_benchmarks.py benchmarks.yml \
+              --times_dir $GITHUB_WORKSPACE/TIMES_model \
+              --verbose \
+              | tee out.txt; \
+          echo ${PIPESTATUS[0]} > retcode.txt)
+
       - name: Print summary
         working-directory: xl2times
         run: |

diff --git a/.gitignore b/.gitignore
@@ -13,7 +13,11 @@ ground_truth/*
 *.pyproj.*
 speedscope.json
 *.pkl
-.venv/
+.venv*/
 benchmarks/
+.idea/
+.python-version
 docs/_build/
 docs/api/
+.coverage
+/out.txt
diff --git a/README.md b/README.md
@@ -72,6 +72,45 @@ git commit --no-verify
 
 See our GitHub Actions CI `.github/workflows/ci.yml` and the utility script `utils/run_benchmarks.py` to see how to run the tool on the DemoS models.
 
+In short, use the commands below to clone the benchmarks data into your local `benchmarks` dir.
+Note that this assumes you have access to all these repositories (some are private and
+you'll have to request access) - if not, comment out the inaccessible benchmarks from `benchmakrs.yml` before running.
+
+```bash
+mkdir benchmarks
+# Get VEDA example models and reference DD files
+# XLSX files are in private repo for licensing reasons, please request access or replace with your own licensed VEDA example files.
+git clone [email protected]:olejandro/demos-xlsx.git benchmarks/xlsx/
+git clone [email protected]:olejandro/demos-dd.git benchmarks/dd/
+
+# Get Ireland model and reference DD files
+git clone [email protected]:esma-cgep/tim.git benchmarks/xlsx/Ireland
+git clone [email protected]:esma-cgep/tim-gams.git benchmarks/dd/Ireland
+```
+Then to run the benchmarks:
+```bash
+# Run a only a single benchmark by name (see benchmarks.yml for name list)
+python utils/run_benchmarks.py benchmarks.yml --verbose --run DemoS_001-all | tee out.txt
+
+# Run all benchmarks (without GAMS run, just comparing CSV data)
+python utils/run_benchmarks.py benchmarks.yml --verbose | tee out.txt
+
+
+# Run benchmarks with regression tests vs main branch
+git branch feature/your_new_changes --checkout
+# ... make your code changes here ...
+git commit -a -m "your commit message" # code must be committed for comparison to `main` branch to run.
+python utils/run_benchmarks.py benchmarks.yml --verbose | tee out.txt
+```
+At this point, if you haven't broken anything you should see something like:
+```
+Change in runtime: +2.97s
+Change in correct rows: +0
+Change in additional rows: +0
+No regressions. You're awesome!
+```
+If you have a large increase in runtime, a decrease in correct rows or fewer rows being produced, then you've broken something and will need to figure out how to fix it.
+
 ### Debugging Regressions
 
 If your change is causing regressions on one of the benchmarks, a useful way to debug and find the difference is to run the tool in verbose mode and compare the intermediate tables. For example, if your branch has regressions on Demo 1:
@@ -97,6 +136,7 @@ python -m build
 python -m twine upload dist/*
 ```
 
+
 ## Contributing
 
 This project welcomes contributions and suggestions.  Most contributions require you to agree to a

diff --git a/pyproject.toml b/pyproject.toml
@@ -14,20 +14,28 @@ requires-python = ">=3.10"
 license = { file = "LICENSE" }
 keywords = []
 classifiers = [
-  "Development Status :: 4 - Beta",
-  "License :: OSI Approved :: MIT License",
-  "Programming Language :: Python",
-  "Programming Language :: Python :: 3",
+    "Development Status :: 4 - Beta",
+    "License :: OSI Approved :: MIT License",
+    "Programming Language :: Python",
+    "Programming Language :: Python :: 3",
 ]
 dependencies = [
-  "GitPython >= 3.1.31, < 3.2",
-  "more-itertools",
-  "openpyxl >= 3.0, < 3.1",
-  "pandas >= 2.1",
+    "GitPython >= 3.1.31, < 3.2",
+    "more-itertools",
+    "openpyxl >= 3.0, < 3.1",
+    "pandas >= 2.1",
+    "pyarrow",
+    "tqdm",
 ]
 
 [project.optional-dependencies]
-dev = ["black", "pre-commit", "tabulate"]
+dev = [
+    "black",
+    "pre-commit",
+    "tabulate",
+    "pytest",
+    "pytest-cov"
+]
 
 [project.urls]
 Documentation = "https://github.com/etsap-TIMES/xl2times#readme"
@@ -36,3 +44,9 @@ Source = "https://github.com/etsap-TIMES/xl2times"
 
 [project.scripts]
 xl2times = "xl2times.__main__:main"
+
+[tool.pytest.ini_options]
+# don't print runtime warnings
+filterwarnings = ["ignore::DeprecationWarning", "ignore::UserWarning", "ignore::FutureWarning"]
+# show output, print test coverage report
+addopts = '-s --durations=0 --durations-min=5.0 --tb=native --cov-report term --cov-report html --cov=xl2times --cov=utils'
diff --git a/tests/data/austimes_pcg_test_data.parquet b/tests/data/austimes_pcg_test_data.parquet
diff --git a/tests/data/comm_groups_austimes_test_data.parquet b/tests/data/comm_groups_austimes_test_data.parquet
diff --git a/tests/test_transforms.py b/tests/test_transforms.py
@@ -0,0 +1,67 @@
+from datetime import datetime
+
+import pandas as pd
+
+from xl2times import transforms
+from xl2times.transforms import (
+    _process_comm_groups_vectorised,
+    _count_comm_group_vectorised,
+)
+
+pd.set_option(
+    "display.max_rows",
+    20,
+    "display.max_columns",
+    20,
+    "display.width",
+    300,
+    "display.max_colwidth",
+    75,
+    "display.precision",
+    3,
+)
+
+
+class TestTransforms:
+    def test_generate_commodity_groups(self):
+        """
+        Tests that the _count_comm_group_vectorised function works as expected.
+        Full austimes run:
+            Vectorised version took 0.021999 seconds
+            looped version took 966.653371 seconds
+            43958x speedup
+        """
+        # data extracted immediately before the original for loops
+        comm_groups = pd.read_parquet(
+            "tests/data/comm_groups_austimes_test_data.parquet"
+        ).drop(columns=["commoditygroup"])
+
+        # filter data so test runs faster
+        comm_groups = comm_groups.query("region in ['ACT', 'NSW']")
+
+        comm_groups2 = comm_groups.copy()
+        _count_comm_group_vectorised(comm_groups2)
+        assert comm_groups2.drop(columns=["commoditygroup"]).equals(comm_groups)
+        assert comm_groups2.shape == (comm_groups.shape[0], comm_groups.shape[1] + 1)
+
+    def test_default_pcg_vectorised(self):
+        """Tests the default primary commodity group identification logic runs correctly.
+        Full austimes run:
+            Looped version took 1107.66 seconds
+            Vectorised version took 62.85 seconds
+        """
+
+        # data extracted immediately before the original for loops
+        comm_groups = pd.read_parquet("tests/data/austimes_pcg_test_data.parquet")
+
+        comm_groups = comm_groups[(comm_groups["region"].isin(["ACT", "NT"]))]
+        comm_groups2 = _process_comm_groups_vectorised(
+            comm_groups.copy(), transforms.csets_ordered_for_pcg
+        )
+        assert comm_groups2 is not None and not comm_groups2.empty
+        assert comm_groups2.shape == (comm_groups.shape[0], comm_groups.shape[1] + 1)
+        assert comm_groups2.drop(columns=["DefaultVedaPCG"]).equals(comm_groups)
+
+
+if __name__ == "__main__":
+    TestTransforms().test_default_pcg_vectorised()
diff --git a/utils/__init__.py b/utils/__init__.py
diff --git a/utils/dd_to_csv.py b/utils/dd_to_csv.py
@@ -1,4 +1,5 @@
 import argparse
+import sys
 from collections import defaultdict
 import json
 import os
@@ -216,13 +217,17 @@ def convert_dd_to_tabular(
     return
 
 
-if __name__ == "__main__":
+def main(arg_list: None | list[str] = None):
     args_parser = argparse.ArgumentParser()
     args_parser.add_argument(
         "input_dir", type=str, help="Input directory containing .dd files."
     )
     args_parser.add_argument(
         "output_dir", type=str, help="Output directory to save the .csv files in."
     )
-    args = args_parser.parse_args()
+    args = args_parser.parse_args(arg_list)
     convert_dd_to_tabular(args.input_dir, args.output_dir, generate_headers_by_attr())
+
+
+if __name__ == "__main__":
+    main(sys.argv[1:])