Merge pull request #207 from NeuroBench/leaderboard

Leaderboard
NeuroBench · Dec 2, 2024 · 665fe65 · 665fe65
2 parents b432518 + 6756bd0
commit 665fe65
Show file tree

Hide file tree

Showing 4 changed files with 192 additions and 2 deletions.
diff --git a/.github/workflows/python-tests.yaml b/.github/workflows/python-tests.yaml
@@ -65,4 +65,16 @@ jobs:
 
       - name: Run tests
         run: |
-          poetry run pytest
+          poetry run pytest --cov --cov-report=xml
+
+      - name: Upload coverage to Codecov
+        uses: codecov/codecov-action@v4
+        with:
+          directory: ./coverage/reports/
+          env_vars: OS,PYTHON
+          fail_ci_if_error: true
+          files: ./coverage.xml,!./cache
+          flags: unittests
+          name: coverage-neurobench
+          token: ${{ secrets.CODECOV_TOKEN }}
+          verbose: true
diff --git a/README.rst b/README.rst
@@ -1,11 +1,25 @@
 .. image:: https://github.com/NeuroBench/neurobench/blob/main/docs/_static/neurobench_banner_light.jpeg?raw=true
         :align: center
         :width: 800
-
 ============
 Introduction
 ============
 
+.. |coverage| image:: https://codecov.io/gh/NeuroBench/neurobench/graph/badge.svg?token=VDF40UROUM
+ :target: https://codecov.io/gh/NeuroBench/neurobench
+
+.. |docs| image:: https://readthedocs.org/projects/neurobench/badge/?version=latest
+   :target: https://neurobench.readthedocs.io/en/latest/
+
+.. |pypi| image:: https://img.shields.io/pypi/v/neurobench.svg
+   :target: https://pypi.org/project/neurobench/
+
+.. |downloads| image:: https://static.pepy.tech/personalized-badge/neurobench?period=total&units=international_system&left_color=grey&right_color=orange&left_text=downloads
+   :target: https://pepy.tech/project/neurobench
+
+
+|coverage| |pypi| |docs| |downloads|
+
 
 A harness for running evaluations on
 `NeuroBench <https://neurobench.ai>`__ algorithm benchmarks.
@@ -57,6 +71,12 @@ v1.0 benchmarks
 - Non-human Primate (NHP) Motor Prediction
 - Chaotic Function Prediction
 
+Leaderboards
+~~~~~~~~~~~~
+Proposed solutions for the benchmark tasks are evaluated on a set of metrics and compared to the performance of other solutions.
+
+The leaderboards for these benchmarks can be found `here <leaderboard.rst>`__.
+
 Additional benchmarks
 ~~~~~~~~~~~~~~~~~~~~~
 - DVS Gesture Recognition

diff --git a/leaderboard.rst b/leaderboard.rst
@@ -0,0 +1,125 @@
+NeuroBench Leaderboards
+=======================
+
+The following are leaderboards for the **NeuroBench v1.0** algorithm track benchmarks, showcasing the performance of various methods across distinct tasks.
+
+The maintained leaderboards cover the following tasks:  
+
+- **Keyword Few-Shot Class-Incremental Learning (FSCIL)**  
+- **Event Camera Object Detection**  
+- **Non-Human Primate Motor Prediction**  
+- **Chaotic Function Prediction**  
+
+For an interactive version of the leaderboard, visit the official website: `neurobench.ai <https://neurobench.ai>`__.
+
++------------------------------+------------------+--------------+----------------------------------------------+
+| Task                         | Dataset          | Metric       | Task Description                              |
++==============================+==================+==============+==============================================+
+| **Keyword FSCIL**            | MSWC             | Accuracy     | Few-shot continual learning of keyword classes |
++------------------------------+------------------+--------------+----------------------------------------------+
+| **Event Camera Object        | Prophesee 1MP    | COCO mAP     | Detecting automotive objects from event camera |
+| Detection**                  | Automotive       |              | video                                         |
++------------------------------+------------------+--------------+----------------------------------------------+
+| **Non-Human Primate Motor    | Primate Reaching | R²           | Predicting fingertip velocity from cortical   |
+| Prediction**                 |                  |              | recordings                                    |
++------------------------------+------------------+--------------+----------------------------------------------+
+| **Chaotic Function Prediction** | Mackey-Glass  | sMAPE        | Autoregressive modeling of chaotic functions  |
+|                              | time series      |              |                                              |
++------------------------------+------------------+--------------+----------------------------------------------+
+
+Each leaderboard highlights key metrics such as accuracy, sparsity, model footprint, and computational efficiency. Below are detailed insights and rankings for each task.
+
+.. _fscil-benchmark:
+
+Keyword Few-Shot Class-Incremental Learning (FSCIL)
+---------------------------------------------------
+
+The **Keyword Few-Shot Class-Incremental Learning (FSCIL)** task evaluates models on their ability to perform continual learning in the context of keyword spotting. Keyword spotting involves detecting and recognizing specific spoken words or phrases from audio input—a critical feature for voice-controlled applications.
+
+This benchmark focuses on the challenge of learning new keyword classes incrementally, with limited examples, while retaining performance on previously learned classes. Models are evaluated on two metrics:  
+- **Base Accuracy:** Accuracy on the base class (session 0) test set.  
+- **Session Average Accuracy:** Average accuracy across all sessions (0 to 10).  
+
+The table below compares methods in terms of accuracy, resource efficiency, and sparsity metrics, offering insights into their trade-offs and performance in this incremental learning scenario.
+
+
++-----------+-----------------------------------+-----------+------------------+---------------------+---------------------+---------+--------------------+--------------------+---------------+
+| Method    | Accuracy (Base / Session Average) | Footprint | Model Exec. Rate | Connection Sparsity | Activation Sparsity | Dense   | Eff_MACs           | Eff_ACs            | Date Submitted|
++===========+===================================+===========+==================+=====================+=====================+=========+====================+====================+===============+
+| M5 ANN    | (97.09% / 89.27%)                 | 6.03E6    | 1                | 0.0                 | 0.783               | 2.59E7  | 7.85E6             | 0                  | 2024-01-17    |
++-----------+-----------------------------------+-----------+------------------+---------------------+---------------------+---------+--------------------+--------------------+---------------+
+| SNN       | (93.48% / 75.27%)                 | 1.36E7    | 200              | 0.0                 | 0.916               | 3.39E6  | 0                  | 3.65E5             | 2024-01-17    |
++-----------+-----------------------------------+-----------+------------------+---------------------+---------------------+---------+--------------------+--------------------+---------------+
+
+.. _event-camera-benchmark:
+
+Event Camera Object Detection
+-----------------------------
+
+**Event Camera Object Detection** evaluates models on their ability to detect and classify objects using data from event-based cameras. Unlike conventional cameras, event cameras capture changes in brightness asynchronously for each pixel, providing high temporal resolution and robustness to motion blur and lighting conditions. These unique properties make them ideal for applications like autonomous driving and robotics.
+
+In this benchmark, models are tasked with detecting automotive objects in event camera video streams. The primary evaluation metric is **COCO mAP (Mean Average Precision)**, which measures detection accuracy. The table below compares methods based on their detection performance, computational efficiency, and sparsity characteristics, highlighting trade-offs relevant for real-world deployments.
+
+
++----------+----------+-------------+------------------+---------------------+---------------------+---------+------------+---------+---------------+
+| Method   | COCO mAP | Footprint   | Model Exec. Rate | Connection Sparsity | Activation Sparsity | Dense   |Eff_MACs    | Eff_ACs | Date Submitted|
++==========+==========+=============+==================+=====================+=====================+=========+============+=========+===============+
+| RED ANN  | 0.429    | 9.13E7      | 20               | 0.0                 | 0.634               | 2.84E11 | 2.48E11    | 0       | 2024-01-17    |
++----------+----------+-------------+------------------+---------------------+---------------------+---------+------------+---------+---------------+
+| Hybrid   | 0.271    | 1.21E7      | 20               | 0.0                 | 0.613               | 9.85E10 | 3.76E10    | 5.60E8  | 2024-01-17    |
++----------+----------+-------------+------------------+---------------------+---------------------+---------+------------+---------+---------------+
+
+.. _nhp-motor-benchmark:
+
+Non-Human Primate Motor Prediction
+----------------------------------
+
+**Non-Human Primate Motor Prediction** evaluates models on their ability to predict motor behavior, specifically fingertip velocity, from cortical neural recordings. This task is essential for advancing brain-machine interfaces (BMIs), which have applications in neuroprosthetics and understanding motor control mechanisms.
+
+The benchmark provides separate solutions for each primate in the dataset, with models evaluated using the **R² metric**, representing the proportion of variance in the observed data explained by the predicted values. The challenge focuses on achieving high prediction accuracy while maintaining computational efficiency and leveraging sparsity for real-time applications.
+
+The table below presents performance comparisons, including baseline models from the original NeuroBench publication, and highlights improvements made by submitted solutions. Notably, the `tinyRSNN` model demonstrates competitive performance with minimal computational resources, showcasing its potential for lightweight deployment.
+
+
++---------------------------------------------------------------------+-------+-------------------+-----------------------+---------------------+---------------------+---------+----------+---------+---------------+
+| Method                                                              | R^2   | Footprint (bytes) | Model Exec. Rate (Hz) | Connection Sparsity | Activation Sparsity | Dense   | Eff_MACs | Eff_ACs | Date Submitted|
++---------------------------------------------------------------------+-------+-------------------+-----------------------+---------------------+---------------------+---------+----------+---------+---------------+
+| `AEGRU <https://arxiv.org/pdf/2410.22283>`__                        | 0.71  | 45500             | 250                   | TBC                 | TBC                 | 25100   | TBC      | TBC     | 2024-08-02    |
++---------------------------------------------------------------------+-------+-------------------+-----------------------+---------------------+---------------------+---------+----------+---------+---------------+
+| `GRU-t1 <https://arxiv.org/pdf/2409.04428>`__                       | 0.707 | 352904N           | 250                   | 0.0                 | 0.0                 | 22342   | 8518     | 793     | 2024-08-02    |
++---------------------------------------------------------------------+-------+-------------------+-----------------------+---------------------+---------------------+---------+----------+---------+---------------+
+| `bigSNN <https://arxiv.org/abs/2409.01762>`__                       | 0.698 | 4833360           | 250                   | 0.0                 | 0.968               | 1206272 | 0        | 42003   | 2024-08-02    |
++---------------------------------------------------------------------+-------+-------------------+-----------------------+---------------------+---------------------+---------+----------+---------+---------------+
+| `tinyRSNN <https://arxiv.org/abs/2409.01762>`__                     | 0.66  | 27144             | 250                   | 0.455               | 0.984               | 13440   | 0        | 304     | 2024-08-02    |
++---------------------------------------------------------------------+-------+-------------------+-----------------------+---------------------+---------------------+---------+----------+---------+---------------+
+| `GRU-t2 <https://arxiv.org/pdf/2409.04428>`__                       | 0.621 | 174104            | 250                   | 0.0                 | 0.0                 | 4947    | 627      | 248     | 2024-08-02    |
++---------------------------------------------------------------------+-------+-------------------+-----------------------+---------------------+---------------------+---------+----------+---------+---------------+
+| Baseline SNN                                                        | 0.593 | 19648             | 250                   | 0.0                 | 0.997               | 4900    | 0        | 276     | 2024-01-17    |
+|                                                                     | 0.568 | 38848             | 250                   | 0.0                 | 0.999               | 9700    | 0        | 551     | 2024-01-17    |
++---------------------------------------------------------------------+-------+-------------------+-----------------------+---------------------+---------------------+---------+----------+---------+---------------+
+| Baseline ANN                                                        | 0.593 | 20824             | 250                   | 0.0                 | 0.683               | 4704    | 3836     | 0       | 2024-01-17    |
+|                                                                     | 0.558 | 33496             | 250                   | 0.0                 | 0.668               | 7776    | 6103     | 0       | 2024-01-17    |
++---------------------------------------------------------------------+-------+-------------------+-----------------------+---------------------+---------------------+---------+----------+---------+---------------+
+
+The results from the `BioCas challenge <http://1.117.17.41/neural-decoding-grand-challenge/>`__ are averaged over all primate datasets. One sees that the R^2 score is higher for the submitted solutions compared to the baselines, with the best solution achieving an R^2 score of 0.698.
+Intersetingly, the tinyRSNN model is able to achieve near optimal performance with an extremely small number of operations.
+
+
+.. _chaotic-function-benchmark:
+
+Chaotic Function Prediction Leaderboard
+---------------------------------------
+
+**Chaotic Function Prediction** challenges models to accurately predict values in chaotic time series data, a complex task due to the sensitivity of chaotic systems to initial conditions. This benchmark uses synthetic time series, such as the Mackey-Glass dataset, to evaluate the ability of models to perform autoregressive predictions in highly nonlinear and dynamic environments.
+
+The primary evaluation metric is **sMAPE (Symmetric Mean Absolute Percentage Error)**, which measures prediction accuracy while being robust to scale differences. Since the dataset is synthetic and not tied to real-time scenarios, execution rate is not considered for evaluation.
+
+The table below highlights the performance of various methods, emphasizing their ability to balance accuracy and computational efficiency. This task has implications for modeling in scientific simulations, financial forecasting, and other domains where chaotic systems are prevalent.
+
++----------+----------+-----------+---------------------+---------------------+--------+-----------+---------+---------------+
+| Method   | Accuracy | Footprint | Connection Sparsity | Activation Sparsity | Dense  | Eff_MACs  | Eff_ACs | Date Submitted|
++==========+==========+===========+=====================+=====================+========+===========+=========+===============+
+| LSTM     | 13.37    | 4.90E5    | 0.0                 | 0.530               | 6.03E4 | 6.03E4    | 0       | 2024-01-17    |
++----------+----------+-----------+---------------------+---------------------+--------+-----------+---------+---------------+
+| ESN      | 14.79    | 2.81E5    | 0.876               | 0.0                 | 3.52E4 | 4.37E3    | 0       | 2024-01-17    |
++----------+----------+-----------+---------------------+---------------------+--------+-----------+---------+---------------+ 
diff --git a/pyproject.toml b/pyproject.toml
@@ -80,6 +80,39 @@ testpaths = [
     "tests/test_metrics.py",
 ]
 
+[tool.coverage.run]
+branch = true
+include = [
+    "neurobench/*",
+]
+omit = [
+    "neurobench/examples/*",
+    "neurobench/datasets/*",
+    "neurobench/tests/*",
+]
+
+[tool.coverage.report]
+
+exclude_also = [
+    "def __repr__",
+    "if self\\.debug",
+    "raise AssertionError",
+    "raise NotImplementedError",
+
+    # Don't complain if non-runnable code isn't run:
+    "if 0:",
+    "if __name__ == .__main__.:",
+
+    # Don't complain about abstract methods, they aren't run:
+    "@(abc\\.)?abstractmethod",
+    ]
+
+ignore_errors = true
+format = 'markdown'
+
+[tool.coverage.xml]
+output = "coverage/reports/coverage.xml"
+
 [build-system]
 requires = ["poetry-core"]
 build-backend = "poetry.core.masonry.api"