Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ci: Update ruff and cruft template #291

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

KennethEnevoldsen
Copy link
Collaborator

@KennethEnevoldsen KennethEnevoldsen commented Oct 30, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Added a GitHub Actions workflow for static type checks on the codebase.
    • Enhanced model downloading functionality with additional DaCy models and improved error handling.
  • Bug Fixes

    • Improved warning messages in the DaCy loading process to guide users on model management.
  • Documentation

    • Updated documentation for scoring functions to clarify parameters and expected types.
  • Chores

    • Reorganized import statements in several test files for better clarity.
    • Updated dependency version for ruff in project configuration.

Copy link

coderabbitai bot commented Oct 30, 2024

Walkthrough

The changes in this pull request primarily involve updates to configuration files, the introduction of a new GitHub Actions workflow for static type checks, and modifications to existing workflows. The .cruft.json file has an updated commit hash, while the Makefile has changes to the linting commands. The pyproject.toml file reflects a specific version for the ruff dependency and reorganizes linting configurations. Additionally, several Python source files have been enhanced for model management and scoring functionality, and various test files have reordered import statements without altering their functionality.

Changes

File Change Summary
.cruft.json Updated commit hash from e02068889310225ea4f65ea0b203c2949e1597a9 to 85413085032f305896da8bad287a83d53fb0b196.
.github/workflows/static_type_checks.yml New workflow added for running static type checks on the codebase.
.github/workflows/tests.yml Workflow name changed from "Tests" to "test" and job name changed from "pytest" to "test"; comments updated.
Makefile Lint target commands modified to use ruff check instead of ruff.
pyproject.toml Updated ruff dependency version to ==0.7.1, added tool.ruff configuration with exclude and target-version, and reorganized linting sections.
src/dacy/download.py Expanded models_url dictionary with additional DaCy model URLs and updated download_model function for better error handling.
src/dacy/load.py Added warning message in where_is_my_dacy function when verbose is True.
src/dacy/score/input_length.py Updated n_sents_score function's docstring and default score_fn value to include "dep".
src/dacy/score/score.py Modified score function to use get method for accessing scores, added type hints for parameters.
tests/test_datasets.py Reordered import statements; no functional changes.
tests/test_download.py Reordered import statements; no functional changes.
tests/test_hate_speech.py Activated dacy import; no functional changes.
tests/test_score.py Reordered import statements; no functional changes.
tests/test_sentiment.py Reordered import statements; no functional changes.

Poem

In the burrow deep, where changes bloom,
A workflow new dispels the gloom.
With ruff in hand, we check and play,
Our code refined, come what may.
From models grand to tests that cheer,
The rabbit's work is ever near! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (11)
.github/workflows/static_type_checks.yml (3)

1-7: Documentation looks good but could be more specific.

The comments clearly state the purpose and setup requirements. However, consider adding more details about the specific type checker being used and any configuration files it depends on.

 # THIS ACTION WILL:
   # 1. Install dependencies
   # 2. Run static type checker
+  #    (using mypy/pyright configured in pyproject.toml)

 # SETUP:
   # None required except for the Makefile
+  # See pyproject.toml for type checker configuration

Line range hint 16-41: Consider enhancing the job configuration for better efficiency and security.

While the basic setup is functional, there are several opportunities for improvement:

  1. Pin the action versions more specifically for security:
-      - uses: actions/checkout@v3
+      - uses: actions/[email protected]
-        uses: actions/setup-python@v4
+        uses: actions/[email protected]
  1. Add hash verification for dependencies:
       - name: Install dependencies
         shell: bash
         run: |
+          pip install --require-hashes -r requirements.txt
           make install
  1. Add timeout to prevent hanging jobs:
   static_type_checks:
     runs-on: ubuntu-latest
+    timeout-minutes: 10
  1. Consider adding error annotations:
       - name: Run static type checker
         shell: bash
         run: |
           make static-type-check
+        continue-on-error: true
+      - name: Upload type check results
+        if: always()
+        uses: actions/upload-artifact@v3
+        with:
+          name: typecheck-results
+          path: .mypy_cache/

Static type checks should be run against Python 3.9, 3.10, 3.11, and 3.12

The PyPI badge shows that this package officially supports Python 3.9, 3.10, 3.11, and 3.12. Additionally, the test workflow already runs against multiple Python versions (3.9 and 3.10). The static type checks should be expanded to cover all supported Python versions for consistency.

     strategy:
       matrix:
         os: [ubuntu-latest]
-        python-version: ["3.9"]
+        python-version: ["3.9", "3.10", "3.11", "3.12"]
🔗 Analysis chain

Line range hint 19-23: Consider testing against multiple Python versions.

The workflow only tests against Python 3.9. If the project supports multiple Python versions, consider expanding the matrix.

Let's check the supported Python versions:

     strategy:
       matrix:
         os: [ubuntu-latest]
-        python-version: ["3.9"]
+        python-version: ["3.9", "3.10", "3.11"]
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check pyproject.toml for Python version requirements
rg "python_requires|python_version" pyproject.toml

Length of output: 167


Script:

#!/bin/bash
# Let's check for Python version requirements in multiple potential locations
echo "=== Checking setup.py ==="
fd "setup.py" --exec cat {}

echo -e "\n=== Checking requirements.txt ==="
fd "requirements.txt" --exec cat {}

echo -e "\n=== Checking README.md for Python version info ==="
fd "README.md" --exec grep -i "python" {}

echo -e "\n=== Checking tox.ini for test environments ==="
fd "tox.ini" --exec cat {}

echo -e "\n=== Checking .python-version or .tool-versions ==="
fd "^\.python-version$|^\.tool-versions$" --exec cat {}

Length of output: 2521


Script:

#!/bin/bash
# Let's check the PyPI badge URL to see supported Python versions
echo "=== Checking PyPI badge info ==="
curl -s "https://img.shields.io/pypi/pyversions/dacy" || true

# Let's also check any CI configurations that might indicate Python version support
echo -e "\n=== Checking all GitHub workflow files ==="
fd "\.ya?ml$" .github/workflows/ --exec cat {}

Length of output: 7783

src/dacy/score/input_length.py (3)

Line range hint 12-19: Document the new "dep" scoring option

The score_fn parameter's default value now includes "dep", but this option isn't documented in the docstring's description of valid score_fn values.

Update the docstring to include "dep" in the list of valid options:

         score_fn (List[Union[str, Callable]], optional): A scoring function which takes
             in a list of examples and return a dictionary of the form {"score_name":
             score}. Four potiential strings are valid. "ents" for measuring the
             performance of entity spans. "pos" for measuring the performance of
             pos-tags. "token" for measuring the performance of tokenization. "nlp" for
-            measuring the performance of all components in the specified nlp pipeline.
+            measuring the performance of all components in the specified nlp pipeline.
+            "dep" for measuring the performance of dependency parsing.

Line range hint 57-58: Consider improving type safety in pandas operations

The pandas concatenation operation and return value are marked with # type: ignore. Consider adding proper type annotations:

  1. Define a type variable for the DataFrame structure
  2. Add proper return type annotation for the pandas DataFrame

Example improvement:

from typing import TypeVar
import pandas as pd

DataFrameType = TypeVar('DataFrameType', bound=pd.DataFrame)

def n_sents_score(...) -> DataFrameType:

Type ignore comments indicate real typing issues that need attention

The codebase shows systematic use of # type: ignore comments across scoring functionality, particularly in handling complex types involving Callables, Iterables, and pandas operations. Key observations:

  • No mypy configuration is present, but project uses pyright for type checking
  • Multiple instances of type ignores in score-related functions suggest structural typing issues
  • Common patterns involve:
    • Complex function signatures with Callable types
    • Pandas DataFrame operations
    • Dynamic attribute access with getattr
    • Type coercion in data processing

Recommendations:

  • Replace Union[int, List[int]] with TypeAlias or typing.Sequence
  • Use proper type annotations for pandas operations
  • Consider Protocol classes for Callable parameters
  • Add proper return type hints instead of ignoring them
🔗 Analysis chain

Line range hint 12-19: Consider addressing type ignore comments

There are multiple # type: ignore comments in the function signature. This might indicate underlying typing issues that should be resolved.

Let's check for similar patterns in other files:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Look for type ignore comments in similar scoring functions
rg -l "# type: ignore" "src/dacy/score/"

# Check if mypy is configured in pyproject.toml
cat pyproject.toml | grep -A 10 "\[tool.mypy\]"

Length of output: 145


Script:

#!/bin/bash
# Let's examine the actual type ignore comments and their context
rg -A 2 "# type: ignore" "src/dacy/score/"

# Let's also check if there's a mypy configuration file
fd "mypy.ini|\.mypy.ini"

# And check the content of pyproject.toml for any typing-related configurations
cat pyproject.toml

Length of output: 8259

src/dacy/download.py (4)

Line range hint 18-31: LGTM! Consider documenting the None values.

The models_url dictionary is well-structured with specific commit hashes for reproducibility. The None values for "small", "medium", "large" appear to serve as aliases for the latest versions.

Consider adding a comment explaining the purpose of the None values for clarity:

     "da_dacy_small_trf-0.2.0": "https://huggingface.co/chcaa/da_dacy_small_trf/resolve/0eadea074d5f637e76357c46bbd56451471d0154/da_dacy_small_trf-any-py3-none-any.whl",
     "da_dacy_medium_trf-0.2.0": "https://huggingface.co/chcaa/da_dacy_medium_trf/resolve/e7dba91f855a1d26679dc1ef3aa49f7874b50543/da_dacy_medium_trf-any-py3-none-any.whl",
     "da_dacy_large_trf-0.2.0": "https://huggingface.co/chcaa/da_dacy_large_trf/resolve/963232f378190476503a1bfc35b520cb142e9e41/da_dacy_large_trf-any-py3-none-any.whl",
+    # Aliases for latest versions
     "small": None,
     "medium": None,
     "large": None,

Line range hint 35-50: Enhance type safety and error handling.

The function could benefit from:

  1. Proper typing instead of # type: ignore comments
  2. Error handling for cases when no versions are found

Consider this improved implementation:

 def get_latest_version(model: str) -> str:
     if model in {"small", "medium", "large"}:
         model = f"da_dacy_{model}_trf"
     versions = [mdl.split("-")[-1] for mdl in models_url if mdl.startswith(model)]
+    if not versions:
+        raise ValueError(f"No versions found for model: {model}")
     versions = sorted(
         versions,
-        key=lambda s: [int(u) for u in s.split(".")],  # type: ignore
+        key=lambda s: tuple(int(u) for u in s.split(".")),
         reverse=True,
     )
-    return versions[0]  # type: ignore
+    return versions[0]

Line range hint 67-72: Replace noqa with proper typing.

The noqa comment can be replaced with proper type hints.

 class DownloadProgressBar(tqdm):
-    def update_to(self, b: int = 1, bsize: int = 1, tsize=None) -> None:  # noqa
+    def update_to(self, b: int = 1, bsize: int = 1, tsize: int | None = None) -> None:
         if tsize is not None:
             self.total = tsize
         self.update(b * bsize - self.n)

Line range hint 102-137: Improve type safety and error messages.

The function could benefit from:

  1. Proper typing instead of # type: ignore comments
  2. More specific error messages listing available models

Consider these improvements:

 def download_model(
     model: str,
     force: bool = False,
 ) -> str:
     if model in {"small", "medium", "large"}:
         latest_version = get_latest_version(model)
         model = f"da_dacy_{model}_trf-{latest_version}"
-    mdl_version = model.split("-")[-1]  # type: ignore
+    mdl_version = model.split("-")[-1]

     if model not in models_url:
+        available_models = "\n".join(f"- {m}" for m in sorted(models_url.keys()))
         raise ValueError(
-            f"The model '{model}' is not available in DaCy. Please use dacy.models() to see a"
-            + " list of all models",
+            f"The model '{model}' is not available in DaCy. Available models:\n{available_models}",
         )

-    mdl = model.split("-")[0]  # type: ignore
+    mdl = model.split("-")[0]
src/dacy/score/score.py (1)

Line range hint 38-179: Consider refactoring for improved maintainability.

The score function is quite complex with multiple nested functions. Consider:

  1. Moving the scorer functions (ents_scorer, pos_scorer, etc.) to module level
  2. Extracting the core scoring logic from __score into a separate function
  3. Addressing the # type: ignore comments with proper typing

This would improve maintainability and make the code easier to test.

Would you like me to provide a detailed refactoring proposal?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 57fcb61 and 81ae227.

📒 Files selected for processing (14)
  • .cruft.json (1 hunks)
  • .github/workflows/static_type_checks.yml (1 hunks)
  • .github/workflows/tests.yml (1 hunks)
  • makefile (1 hunks)
  • pyproject.toml (3 hunks)
  • src/dacy/download.py (1 hunks)
  • src/dacy/load.py (1 hunks)
  • src/dacy/score/input_length.py (1 hunks)
  • src/dacy/score/score.py (2 hunks)
  • tests/test_datasets.py (1 hunks)
  • tests/test_download.py (1 hunks)
  • tests/test_hate_speech.py (1 hunks)
  • tests/test_score.py (1 hunks)
  • tests/test_sentiment.py (1 hunks)
✅ Files skipped from review due to trivial changes (6)
  • .cruft.json
  • .github/workflows/tests.yml
  • tests/test_datasets.py
  • tests/test_download.py
  • tests/test_score.py
  • tests/test_sentiment.py
🔇 Additional comments (12)
tests/test_hate_speech.py (2)

3-4: LGTM! The dacy import is required for the test functions.

The import is necessary as the test functions use dacy components for hate speech detection and classification.


3-4: Verify test environment dependencies.

Since this import was previously commented out, let's ensure the test environment is properly configured.

✅ Verification successful

Dependencies are properly configured for dacy

The verification shows that:

  • dacy (version >=2.6.0) is listed in the requirements
  • dacy components are properly registered in pyproject.toml
  • The test file actively uses dacy's hate speech detection and classification components
  • The import is necessary and actively used in the tests
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify dacy dependency is properly specified in project configuration

# Check if dacy is listed in project dependencies
echo "Checking for dacy in pyproject.toml..."
rg "dacy.*=" pyproject.toml

# Check if dacy is listed in test requirements
echo "Checking for dacy in test requirements..."
fd -t f "requirements.*txt" -x rg "dacy"

# Check if dacy is mentioned in CI configuration
echo "Checking CI configuration for test environment setup..."
fd -t f "^\.github/workflows/.*yml$" -x rg "dacy"

Length of output: 1283


Script:

#!/bin/bash
# Let's check the CI workflow files including hidden ones
fd -H -t f "^\.github/workflows/.*yml$" -x rg "dacy"

# Let's also check the test file to understand how dacy is being used
cat tests/test_hate_speech.py

Length of output: 766

.github/workflows/static_type_checks.yml (2)

Line range hint 9-15: LGTM! Workflow triggers are well configured.

The workflow is properly set up to run on both pull requests and pushes to main, which is a good practice for catching type issues early.


Line range hint 32-36: Verify Makefile installation command.

The workflow relies on make install. Let's ensure this command exists and installs all necessary dependencies for type checking.

makefile (1)

12-14: LGTM! Verify ruff version compatibility.

The changes correctly update the ruff commands to use the explicit check subcommand, which is consistent with ruff's newer versions.

Let's verify the ruff version specification in pyproject.toml:

✅ Verification successful

Ruff version compatibility confirmed

The pyproject.toml specifies ruff==0.7.1 in the dev dependencies, which is exactly the version where the explicit check subcommand became standard. The Makefile changes are correctly aligned with this version requirement.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify ruff version compatibility with the new command syntax
# Expect: Version should be >=0.7.1 where 'check' command became standard

rg -l 'ruff.*=.*".*"' pyproject.toml

Length of output: 38


Script:

#!/bin/bash
# Let's try different approaches to find ruff version specification
# 1. Look for any mention of ruff in pyproject.toml
cat pyproject.toml

# 2. As fallback, search for any files that might contain ruff configuration
fd -t f "pyproject.toml|requirements.*|setup.*|poetry.lock"

Length of output: 5194

src/dacy/load.py (2)

2-2: LGTM: Clean import addition

The warnings import is properly placed and necessary for the new warning functionality.


Line range hint 41-57: Several improvements suggested for the warning implementation

  1. The # type: ignore comment suggests unresolved type issues. Consider fixing the type annotations instead of ignoring them.
  2. Consider using a more specific warning category (e.g., DeprecationWarning or UserWarning) instead of the default one.
  3. The warning message could be more readable using f-strings.
  4. The version information could be more explicit.

Here's a suggested improvement:

-def where_is_my_dacy(verbose: bool = True) -> Union[str, Path]:  # type: ignore
+def where_is_my_dacy(verbose: bool = True) -> Path:
     """Returns a path to where DaCy models are located. The default the model
     location can be configured with the environmental variable
     `DACY_CACHE_DIR`.

     Args:
         verbose (bool, optional): Toggles the verbosity of the function. Defaults to
             True.

     Returns:
-        str: path to the location of DaCy models
+        Path: path to the location of DaCy models

     Example:
         >>> import dacy
         >>> dacy.where_is_my_dacy()
     """
     if verbose is True:
-        warnings.warn(  # noqa: B028
-            "DaCy pipelines above and including version 0.1.0 are installed as a "
-            + "python module and are thus located in your python environment under "
-            + "the respective names. To get a list of installed models use "
-            + "spacy.util.get_installed_models()",
+        warnings.warn(
+            f"DaCy pipelines (v0.1.0+) are installed as Python modules in your environment. "
+            f"To find installed models, use spacy.util.get_installed_models()",
+            UserWarning,
         )
     return DEFAULT_CACHE_DIR

Let's verify the type annotation issue:

✅ Verification successful

Based on the results, I can now generate the final response:

Type annotation can be simplified and made more accurate

The # type: ignore comment is unnecessary as we can see that:

  1. DEFAULT_CACHE_DIR is defined in src/dacy/download.py and is derived from DACY_DEFAULT_PATH which is explicitly a Path object (Path.home() / ".dacy")
  2. The function always returns DEFAULT_CACHE_DIR which is a Path object
  3. Path is properly imported in the module

The other suggestions about warning implementation are valid improvements for code readability and maintainability.

-def where_is_my_dacy(verbose: bool = True) -> Union[str, Path]:  # type: ignore
+def where_is_my_dacy(verbose: bool = True) -> Path:
     if verbose:
         warnings.warn(
-            "DaCy pipelines above and including version 0.1.0 are installed as a "
-            + "python module and are thus located in your python environment under "
-            + "the respective names. To get a list of installed models use "
-            + "spacy.util.get_installed_models()",  # noqa: B028
+            f"DaCy pipelines (v0.1.0+) are installed as Python modules in your environment. "
+            f"To find installed models, use spacy.util.get_installed_models()",
+            UserWarning,
         )
     return DEFAULT_CACHE_DIR
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if DEFAULT_CACHE_DIR is defined as Path elsewhere
rg -A 1 "DEFAULT_CACHE_DIR\s*=.*Path" 

Length of output: 39


Script:

#!/bin/bash
# Let's check for DEFAULT_CACHE_DIR definition and its usage
rg "DEFAULT_CACHE_DIR" -B 2 -A 2

# Also check for Path imports and usage
rg "from pathlib import Path" -B 2 -A 2

# And check the actual return value usage
ast-grep --pattern 'where_is_my_dacy($$$)'

Length of output: 4869

src/dacy/score/input_length.py (1)

3-3: LGTM: Type hint imports are properly added

The new typing imports align well with the type hints used in the function signature and annotations.

src/dacy/download.py (1)

Line range hint 1-8: LGTM! Well-structured code organization.

The file is well-organized with clear sections, proper imports, and consistent type hints.

pyproject.toml (2)

51-51: LGTM! Good practice to pin ruff version.

Pinning to exact version 0.7.1 ensures consistent linting behavior across all environments.


196-201: LGTM! Well-structured rule configurations.

The configurations for flake8-annotations and mccabe follow best practices:

  • Appropriate handling of type annotations with mypy integration
  • Standard McCabe complexity threshold of 10
src/dacy/score/score.py (1)

Line range hint 38-76: LGTM! Well-documented function signature.

The function signature is properly typed and the documentation is comprehensive, clearly explaining all parameters and their expected types.

"papers/DaCy-A-Unified-Framework-for-Danish-NLP/**",
"docs/performance_testing_utils/**",
]
target-version = "py38"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Align ruff target-version with project's Python requirement.

The project requires Python >=3.9, but ruff's target-version is set to py38. Consider updating to:

-target-version = "py38"
+target-version = "py39"

This alignment ensures ruff uses language features and checks appropriate for your minimum supported Python version.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
target-version = "py38"
target-version = "py39"

Comment on lines +111 to +141
exclude = [
".bzr",
".direnv",
".eggs",
".git",
".hg",
".nox",
".pants.d",
".pytype",
".ruff_cache",
".svn",
".tox",
".venv",
"__pypackages__",
"_build",
"buck-out",
"build",
"dist",
"node_modules",
"venv",
"__init__.py",
".venv",
".env",
".git",
"__pycache__",
"dev/**",
"training/main/**",
"training/ner_fine_grained/**",
"papers/DaCy-A-Unified-Framework-for-Danish-NLP/**",
"docs/performance_testing_utils/**",
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove duplicate exclude patterns.

The exclude list contains duplicate entries:

  • .venv appears on lines 123 and 132
  • .git appears on lines 115 and 134

Apply this diff to remove duplicates:

exclude = [
  ".bzr",
  ".direnv",
  ".eggs",
  ".git",
  ".hg",
  ".nox",
  ".pants.d",
  ".pytype",
  ".ruff_cache",
  ".svn",
  ".tox",
  ".venv",
  "__pypackages__",
  "_build",
  "buck-out",
  "build",
  "dist",
  "node_modules",
  "venv",
  "__init__.py",
- ".venv",  # Remove duplicate
  ".env",
- ".git",   # Remove duplicate
  "__pycache__",
  "dev/**",
  "training/main/**",
  "training/ner_fine_grained/**",
  "papers/DaCy-A-Unified-Framework-for-Danish-NLP/**",
  "docs/performance_testing_utils/**",
]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
exclude = [
".bzr",
".direnv",
".eggs",
".git",
".hg",
".nox",
".pants.d",
".pytype",
".ruff_cache",
".svn",
".tox",
".venv",
"__pypackages__",
"_build",
"buck-out",
"build",
"dist",
"node_modules",
"venv",
"__init__.py",
".venv",
".env",
".git",
"__pycache__",
"dev/**",
"training/main/**",
"training/ner_fine_grained/**",
"papers/DaCy-A-Unified-Framework-for-Danish-NLP/**",
"docs/performance_testing_utils/**",
]
exclude = [
".bzr",
".direnv",
".eggs",
".git",
".hg",
".nox",
".pants.d",
".pytype",
".ruff_cache",
".svn",
".tox",
".venv",
"__pypackages__",
"_build",
"buck-out",
"build",
"dist",
"node_modules",
"venv",
"__init__.py",
".env",
"__pycache__",
"dev/**",
"training/main/**",
"training/ner_fine_grained/**",
"papers/DaCy-A-Unified-Framework-for-Danish-NLP/**",
"docs/performance_testing_utils/**",
]

@@ -163,7 +164,7 @@ def __score(augmenter): # noqa: ANN001

# and collapse list to dict
for key in scores: # type: ignore
scores[key] = [s[key] if key in s else None for s in scores_ls] # type: ignore
scores[key] = [s.get([key], None) for s in scores_ls] # type: ignore
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix critical bug in score aggregation.

There's a bug in the dictionary access where a list is incorrectly passed to the get() method. This will cause all scores to be None as the key will never be found.

Apply this fix:

-            scores[key] = [s.get([key], None) for s in scores_ls]
+            scores[key] = [s.get(key, None) for s in scores_ls]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
scores[key] = [s.get([key], None) for s in scores_ls] # type: ignore
scores[key] = [s.get(key, None) for s in scores_ls] # type: ignore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant