-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ci: Update ruff and cruft template #291
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe changes in this pull request primarily involve updates to configuration files, the introduction of a new GitHub Actions workflow for static type checks, and modifications to existing workflows. The Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (11)
.github/workflows/static_type_checks.yml (3)
1-7
: Documentation looks good but could be more specific.The comments clearly state the purpose and setup requirements. However, consider adding more details about the specific type checker being used and any configuration files it depends on.
# THIS ACTION WILL: # 1. Install dependencies # 2. Run static type checker + # (using mypy/pyright configured in pyproject.toml) # SETUP: # None required except for the Makefile + # See pyproject.toml for type checker configuration
Line range hint
16-41
: Consider enhancing the job configuration for better efficiency and security.While the basic setup is functional, there are several opportunities for improvement:
- Pin the action versions more specifically for security:
- - uses: actions/checkout@v3 + - uses: actions/[email protected] - uses: actions/setup-python@v4 + uses: actions/[email protected]
- Add hash verification for dependencies:
- name: Install dependencies shell: bash run: | + pip install --require-hashes -r requirements.txt make install
- Add timeout to prevent hanging jobs:
static_type_checks: runs-on: ubuntu-latest + timeout-minutes: 10
- Consider adding error annotations:
- name: Run static type checker shell: bash run: | make static-type-check + continue-on-error: true + - name: Upload type check results + if: always() + uses: actions/upload-artifact@v3 + with: + name: typecheck-results + path: .mypy_cache/
Static type checks should be run against Python 3.9, 3.10, 3.11, and 3.12
The PyPI badge shows that this package officially supports Python 3.9, 3.10, 3.11, and 3.12. Additionally, the test workflow already runs against multiple Python versions (3.9 and 3.10). The static type checks should be expanded to cover all supported Python versions for consistency.
strategy: matrix: os: [ubuntu-latest] - python-version: ["3.9"] + python-version: ["3.9", "3.10", "3.11", "3.12"]🔗 Analysis chain
Line range hint
19-23
: Consider testing against multiple Python versions.The workflow only tests against Python 3.9. If the project supports multiple Python versions, consider expanding the matrix.
Let's check the supported Python versions:
strategy: matrix: os: [ubuntu-latest] - python-version: ["3.9"] + python-version: ["3.9", "3.10", "3.11"]🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check pyproject.toml for Python version requirements rg "python_requires|python_version" pyproject.tomlLength of output: 167
Script:
#!/bin/bash # Let's check for Python version requirements in multiple potential locations echo "=== Checking setup.py ===" fd "setup.py" --exec cat {} echo -e "\n=== Checking requirements.txt ===" fd "requirements.txt" --exec cat {} echo -e "\n=== Checking README.md for Python version info ===" fd "README.md" --exec grep -i "python" {} echo -e "\n=== Checking tox.ini for test environments ===" fd "tox.ini" --exec cat {} echo -e "\n=== Checking .python-version or .tool-versions ===" fd "^\.python-version$|^\.tool-versions$" --exec cat {}Length of output: 2521
Script:
#!/bin/bash # Let's check the PyPI badge URL to see supported Python versions echo "=== Checking PyPI badge info ===" curl -s "https://img.shields.io/pypi/pyversions/dacy" || true # Let's also check any CI configurations that might indicate Python version support echo -e "\n=== Checking all GitHub workflow files ===" fd "\.ya?ml$" .github/workflows/ --exec cat {}Length of output: 7783
src/dacy/score/input_length.py (3)
Line range hint
12-19
: Document the new "dep" scoring optionThe score_fn parameter's default value now includes "dep", but this option isn't documented in the docstring's description of valid score_fn values.
Update the docstring to include "dep" in the list of valid options:
score_fn (List[Union[str, Callable]], optional): A scoring function which takes in a list of examples and return a dictionary of the form {"score_name": score}. Four potiential strings are valid. "ents" for measuring the performance of entity spans. "pos" for measuring the performance of pos-tags. "token" for measuring the performance of tokenization. "nlp" for - measuring the performance of all components in the specified nlp pipeline. + measuring the performance of all components in the specified nlp pipeline. + "dep" for measuring the performance of dependency parsing.
Line range hint
57-58
: Consider improving type safety in pandas operationsThe pandas concatenation operation and return value are marked with
# type: ignore
. Consider adding proper type annotations:
- Define a type variable for the DataFrame structure
- Add proper return type annotation for the pandas DataFrame
Example improvement:
from typing import TypeVar import pandas as pd DataFrameType = TypeVar('DataFrameType', bound=pd.DataFrame) def n_sents_score(...) -> DataFrameType:
Type ignore comments indicate real typing issues that need attention
The codebase shows systematic use of
# type: ignore
comments across scoring functionality, particularly in handling complex types involving Callables, Iterables, and pandas operations. Key observations:
- No mypy configuration is present, but project uses
pyright
for type checking- Multiple instances of type ignores in score-related functions suggest structural typing issues
- Common patterns involve:
- Complex function signatures with Callable types
- Pandas DataFrame operations
- Dynamic attribute access with getattr
- Type coercion in data processing
Recommendations:
- Replace Union[int, List[int]] with TypeAlias or typing.Sequence
- Use proper type annotations for pandas operations
- Consider Protocol classes for Callable parameters
- Add proper return type hints instead of ignoring them
🔗 Analysis chain
Line range hint
12-19
: Consider addressing type ignore commentsThere are multiple
# type: ignore
comments in the function signature. This might indicate underlying typing issues that should be resolved.Let's check for similar patterns in other files:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Look for type ignore comments in similar scoring functions rg -l "# type: ignore" "src/dacy/score/" # Check if mypy is configured in pyproject.toml cat pyproject.toml | grep -A 10 "\[tool.mypy\]"Length of output: 145
Script:
#!/bin/bash # Let's examine the actual type ignore comments and their context rg -A 2 "# type: ignore" "src/dacy/score/" # Let's also check if there's a mypy configuration file fd "mypy.ini|\.mypy.ini" # And check the content of pyproject.toml for any typing-related configurations cat pyproject.tomlLength of output: 8259
src/dacy/download.py (4)
Line range hint
18-31
: LGTM! Consider documenting the None values.The models_url dictionary is well-structured with specific commit hashes for reproducibility. The None values for "small", "medium", "large" appear to serve as aliases for the latest versions.
Consider adding a comment explaining the purpose of the None values for clarity:
"da_dacy_small_trf-0.2.0": "https://huggingface.co/chcaa/da_dacy_small_trf/resolve/0eadea074d5f637e76357c46bbd56451471d0154/da_dacy_small_trf-any-py3-none-any.whl", "da_dacy_medium_trf-0.2.0": "https://huggingface.co/chcaa/da_dacy_medium_trf/resolve/e7dba91f855a1d26679dc1ef3aa49f7874b50543/da_dacy_medium_trf-any-py3-none-any.whl", "da_dacy_large_trf-0.2.0": "https://huggingface.co/chcaa/da_dacy_large_trf/resolve/963232f378190476503a1bfc35b520cb142e9e41/da_dacy_large_trf-any-py3-none-any.whl", + # Aliases for latest versions "small": None, "medium": None, "large": None,
Line range hint
35-50
: Enhance type safety and error handling.The function could benefit from:
- Proper typing instead of
# type: ignore
comments- Error handling for cases when no versions are found
Consider this improved implementation:
def get_latest_version(model: str) -> str: if model in {"small", "medium", "large"}: model = f"da_dacy_{model}_trf" versions = [mdl.split("-")[-1] for mdl in models_url if mdl.startswith(model)] + if not versions: + raise ValueError(f"No versions found for model: {model}") versions = sorted( versions, - key=lambda s: [int(u) for u in s.split(".")], # type: ignore + key=lambda s: tuple(int(u) for u in s.split(".")), reverse=True, ) - return versions[0] # type: ignore + return versions[0]
Line range hint
67-72
: Replace noqa with proper typing.The
noqa
comment can be replaced with proper type hints.class DownloadProgressBar(tqdm): - def update_to(self, b: int = 1, bsize: int = 1, tsize=None) -> None: # noqa + def update_to(self, b: int = 1, bsize: int = 1, tsize: int | None = None) -> None: if tsize is not None: self.total = tsize self.update(b * bsize - self.n)
Line range hint
102-137
: Improve type safety and error messages.The function could benefit from:
- Proper typing instead of
# type: ignore
comments- More specific error messages listing available models
Consider these improvements:
def download_model( model: str, force: bool = False, ) -> str: if model in {"small", "medium", "large"}: latest_version = get_latest_version(model) model = f"da_dacy_{model}_trf-{latest_version}" - mdl_version = model.split("-")[-1] # type: ignore + mdl_version = model.split("-")[-1] if model not in models_url: + available_models = "\n".join(f"- {m}" for m in sorted(models_url.keys())) raise ValueError( - f"The model '{model}' is not available in DaCy. Please use dacy.models() to see a" - + " list of all models", + f"The model '{model}' is not available in DaCy. Available models:\n{available_models}", ) - mdl = model.split("-")[0] # type: ignore + mdl = model.split("-")[0]src/dacy/score/score.py (1)
Line range hint
38-179
: Consider refactoring for improved maintainability.The
score
function is quite complex with multiple nested functions. Consider:
- Moving the scorer functions (
ents_scorer
,pos_scorer
, etc.) to module level- Extracting the core scoring logic from
__score
into a separate function- Addressing the
# type: ignore
comments with proper typingThis would improve maintainability and make the code easier to test.
Would you like me to provide a detailed refactoring proposal?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (14)
.cruft.json
(1 hunks).github/workflows/static_type_checks.yml
(1 hunks).github/workflows/tests.yml
(1 hunks)makefile
(1 hunks)pyproject.toml
(3 hunks)src/dacy/download.py
(1 hunks)src/dacy/load.py
(1 hunks)src/dacy/score/input_length.py
(1 hunks)src/dacy/score/score.py
(2 hunks)tests/test_datasets.py
(1 hunks)tests/test_download.py
(1 hunks)tests/test_hate_speech.py
(1 hunks)tests/test_score.py
(1 hunks)tests/test_sentiment.py
(1 hunks)
✅ Files skipped from review due to trivial changes (6)
- .cruft.json
- .github/workflows/tests.yml
- tests/test_datasets.py
- tests/test_download.py
- tests/test_score.py
- tests/test_sentiment.py
🔇 Additional comments (12)
tests/test_hate_speech.py (2)
3-4
: LGTM! The dacy import is required for the test functions.
The import is necessary as the test functions use dacy components for hate speech detection and classification.
3-4
: Verify test environment dependencies.
Since this import was previously commented out, let's ensure the test environment is properly configured.
✅ Verification successful
Dependencies are properly configured for dacy
The verification shows that:
- dacy (version >=2.6.0) is listed in the requirements
- dacy components are properly registered in pyproject.toml
- The test file actively uses dacy's hate speech detection and classification components
- The import is necessary and actively used in the tests
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify dacy dependency is properly specified in project configuration
# Check if dacy is listed in project dependencies
echo "Checking for dacy in pyproject.toml..."
rg "dacy.*=" pyproject.toml
# Check if dacy is listed in test requirements
echo "Checking for dacy in test requirements..."
fd -t f "requirements.*txt" -x rg "dacy"
# Check if dacy is mentioned in CI configuration
echo "Checking CI configuration for test environment setup..."
fd -t f "^\.github/workflows/.*yml$" -x rg "dacy"
Length of output: 1283
Script:
#!/bin/bash
# Let's check the CI workflow files including hidden ones
fd -H -t f "^\.github/workflows/.*yml$" -x rg "dacy"
# Let's also check the test file to understand how dacy is being used
cat tests/test_hate_speech.py
Length of output: 766
.github/workflows/static_type_checks.yml (2)
Line range hint 9-15
: LGTM! Workflow triggers are well configured.
The workflow is properly set up to run on both pull requests and pushes to main, which is a good practice for catching type issues early.
Line range hint 32-36
: Verify Makefile installation command.
The workflow relies on make install
. Let's ensure this command exists and installs all necessary dependencies for type checking.
makefile (1)
12-14
: LGTM! Verify ruff version compatibility.
The changes correctly update the ruff commands to use the explicit check
subcommand, which is consistent with ruff's newer versions.
Let's verify the ruff version specification in pyproject.toml:
✅ Verification successful
Ruff version compatibility confirmed
The pyproject.toml specifies ruff==0.7.1
in the dev dependencies, which is exactly the version where the explicit check
subcommand became standard. The Makefile changes are correctly aligned with this version requirement.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify ruff version compatibility with the new command syntax
# Expect: Version should be >=0.7.1 where 'check' command became standard
rg -l 'ruff.*=.*".*"' pyproject.toml
Length of output: 38
Script:
#!/bin/bash
# Let's try different approaches to find ruff version specification
# 1. Look for any mention of ruff in pyproject.toml
cat pyproject.toml
# 2. As fallback, search for any files that might contain ruff configuration
fd -t f "pyproject.toml|requirements.*|setup.*|poetry.lock"
Length of output: 5194
src/dacy/load.py (2)
2-2
: LGTM: Clean import addition
The warnings
import is properly placed and necessary for the new warning functionality.
Line range hint 41-57
: Several improvements suggested for the warning implementation
- The
# type: ignore
comment suggests unresolved type issues. Consider fixing the type annotations instead of ignoring them. - Consider using a more specific warning category (e.g.,
DeprecationWarning
orUserWarning
) instead of the default one. - The warning message could be more readable using f-strings.
- The version information could be more explicit.
Here's a suggested improvement:
-def where_is_my_dacy(verbose: bool = True) -> Union[str, Path]: # type: ignore
+def where_is_my_dacy(verbose: bool = True) -> Path:
"""Returns a path to where DaCy models are located. The default the model
location can be configured with the environmental variable
`DACY_CACHE_DIR`.
Args:
verbose (bool, optional): Toggles the verbosity of the function. Defaults to
True.
Returns:
- str: path to the location of DaCy models
+ Path: path to the location of DaCy models
Example:
>>> import dacy
>>> dacy.where_is_my_dacy()
"""
if verbose is True:
- warnings.warn( # noqa: B028
- "DaCy pipelines above and including version 0.1.0 are installed as a "
- + "python module and are thus located in your python environment under "
- + "the respective names. To get a list of installed models use "
- + "spacy.util.get_installed_models()",
+ warnings.warn(
+ f"DaCy pipelines (v0.1.0+) are installed as Python modules in your environment. "
+ f"To find installed models, use spacy.util.get_installed_models()",
+ UserWarning,
)
return DEFAULT_CACHE_DIR
Let's verify the type annotation issue:
✅ Verification successful
Based on the results, I can now generate the final response:
Type annotation can be simplified and made more accurate
The # type: ignore
comment is unnecessary as we can see that:
DEFAULT_CACHE_DIR
is defined insrc/dacy/download.py
and is derived fromDACY_DEFAULT_PATH
which is explicitly aPath
object (Path.home() / ".dacy"
)- The function always returns
DEFAULT_CACHE_DIR
which is aPath
object Path
is properly imported in the module
The other suggestions about warning implementation are valid improvements for code readability and maintainability.
-def where_is_my_dacy(verbose: bool = True) -> Union[str, Path]: # type: ignore
+def where_is_my_dacy(verbose: bool = True) -> Path:
if verbose:
warnings.warn(
- "DaCy pipelines above and including version 0.1.0 are installed as a "
- + "python module and are thus located in your python environment under "
- + "the respective names. To get a list of installed models use "
- + "spacy.util.get_installed_models()", # noqa: B028
+ f"DaCy pipelines (v0.1.0+) are installed as Python modules in your environment. "
+ f"To find installed models, use spacy.util.get_installed_models()",
+ UserWarning,
)
return DEFAULT_CACHE_DIR
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if DEFAULT_CACHE_DIR is defined as Path elsewhere
rg -A 1 "DEFAULT_CACHE_DIR\s*=.*Path"
Length of output: 39
Script:
#!/bin/bash
# Let's check for DEFAULT_CACHE_DIR definition and its usage
rg "DEFAULT_CACHE_DIR" -B 2 -A 2
# Also check for Path imports and usage
rg "from pathlib import Path" -B 2 -A 2
# And check the actual return value usage
ast-grep --pattern 'where_is_my_dacy($$$)'
Length of output: 4869
src/dacy/score/input_length.py (1)
3-3
: LGTM: Type hint imports are properly added
The new typing imports align well with the type hints used in the function signature and annotations.
src/dacy/download.py (1)
Line range hint 1-8
: LGTM! Well-structured code organization.
The file is well-organized with clear sections, proper imports, and consistent type hints.
pyproject.toml (2)
51-51
: LGTM! Good practice to pin ruff version.
Pinning to exact version 0.7.1 ensures consistent linting behavior across all environments.
196-201
: LGTM! Well-structured rule configurations.
The configurations for flake8-annotations and mccabe follow best practices:
- Appropriate handling of type annotations with mypy integration
- Standard McCabe complexity threshold of 10
src/dacy/score/score.py (1)
Line range hint 38-76
: LGTM! Well-documented function signature.
The function signature is properly typed and the documentation is comprehensive, clearly explaining all parameters and their expected types.
"papers/DaCy-A-Unified-Framework-for-Danish-NLP/**", | ||
"docs/performance_testing_utils/**", | ||
] | ||
target-version = "py38" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align ruff target-version with project's Python requirement.
The project requires Python >=3.9, but ruff's target-version is set to py38. Consider updating to:
-target-version = "py38"
+target-version = "py39"
This alignment ensures ruff uses language features and checks appropriate for your minimum supported Python version.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
target-version = "py38" | |
target-version = "py39" |
exclude = [ | ||
".bzr", | ||
".direnv", | ||
".eggs", | ||
".git", | ||
".hg", | ||
".nox", | ||
".pants.d", | ||
".pytype", | ||
".ruff_cache", | ||
".svn", | ||
".tox", | ||
".venv", | ||
"__pypackages__", | ||
"_build", | ||
"buck-out", | ||
"build", | ||
"dist", | ||
"node_modules", | ||
"venv", | ||
"__init__.py", | ||
".venv", | ||
".env", | ||
".git", | ||
"__pycache__", | ||
"dev/**", | ||
"training/main/**", | ||
"training/ner_fine_grained/**", | ||
"papers/DaCy-A-Unified-Framework-for-Danish-NLP/**", | ||
"docs/performance_testing_utils/**", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove duplicate exclude patterns.
The exclude list contains duplicate entries:
.venv
appears on lines 123 and 132.git
appears on lines 115 and 134
Apply this diff to remove duplicates:
exclude = [
".bzr",
".direnv",
".eggs",
".git",
".hg",
".nox",
".pants.d",
".pytype",
".ruff_cache",
".svn",
".tox",
".venv",
"__pypackages__",
"_build",
"buck-out",
"build",
"dist",
"node_modules",
"venv",
"__init__.py",
- ".venv", # Remove duplicate
".env",
- ".git", # Remove duplicate
"__pycache__",
"dev/**",
"training/main/**",
"training/ner_fine_grained/**",
"papers/DaCy-A-Unified-Framework-for-Danish-NLP/**",
"docs/performance_testing_utils/**",
]
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
exclude = [ | |
".bzr", | |
".direnv", | |
".eggs", | |
".git", | |
".hg", | |
".nox", | |
".pants.d", | |
".pytype", | |
".ruff_cache", | |
".svn", | |
".tox", | |
".venv", | |
"__pypackages__", | |
"_build", | |
"buck-out", | |
"build", | |
"dist", | |
"node_modules", | |
"venv", | |
"__init__.py", | |
".venv", | |
".env", | |
".git", | |
"__pycache__", | |
"dev/**", | |
"training/main/**", | |
"training/ner_fine_grained/**", | |
"papers/DaCy-A-Unified-Framework-for-Danish-NLP/**", | |
"docs/performance_testing_utils/**", | |
] | |
exclude = [ | |
".bzr", | |
".direnv", | |
".eggs", | |
".git", | |
".hg", | |
".nox", | |
".pants.d", | |
".pytype", | |
".ruff_cache", | |
".svn", | |
".tox", | |
".venv", | |
"__pypackages__", | |
"_build", | |
"buck-out", | |
"build", | |
"dist", | |
"node_modules", | |
"venv", | |
"__init__.py", | |
".env", | |
"__pycache__", | |
"dev/**", | |
"training/main/**", | |
"training/ner_fine_grained/**", | |
"papers/DaCy-A-Unified-Framework-for-Danish-NLP/**", | |
"docs/performance_testing_utils/**", | |
] |
@@ -163,7 +164,7 @@ def __score(augmenter): # noqa: ANN001 | |||
|
|||
# and collapse list to dict | |||
for key in scores: # type: ignore | |||
scores[key] = [s[key] if key in s else None for s in scores_ls] # type: ignore | |||
scores[key] = [s.get([key], None) for s in scores_ls] # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix critical bug in score aggregation.
There's a bug in the dictionary access where a list is incorrectly passed to the get()
method. This will cause all scores to be None
as the key will never be found.
Apply this fix:
- scores[key] = [s.get([key], None) for s in scores_ls]
+ scores[key] = [s.get(key, None) for s in scores_ls]
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
scores[key] = [s.get([key], None) for s in scores_ls] # type: ignore | |
scores[key] = [s.get(key, None) for s in scores_ls] # type: ignore |
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation
Chores
ruff
in project configuration.