Encoding warning use PEP 597 env var `PYTHONWARNDEFAULTENCODING` #733

DanielYang59 · 2024-12-20T02:39:03Z

Summary

Encoding warning use PEP 597 environment variable PYTHONWARNDEFAULTENCODING, to fix What's the correct way to use the MAGMOM's value when using MP API data? materialsproject/pymatgen#4173 (comment)
Remove custom encoding warning as monty is Python 3.10+
PEP 604 – Allow writing union types as X | Y (Python 3.10+)
PEP 604, t | None or None | t instead of Optional[t]

Summary by CodeRabbit

Bug Fixes
- Enhanced warning handling in tests related to encoding practices.
New Features
- Improved test suite for encoding warnings, allowing dynamic environment manipulation.
Refactor
- Updated type hints across various functions for clarity and modern syntax.
- Simplified parameter types in multiple methods to enhance readability.
Documentation
- Updated docstrings to reflect changes in parameter types and functionality.

coderabbitai · 2024-12-20T02:39:09Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This pull request focuses on modernizing type hints across multiple files in the monty library. The changes primarily involve updating type annotations from older Union and Optional types to the more concise | union syntax introduced in Python 3.10. These modifications enhance type clarity and readability while maintaining the existing functionality of the codebase. The changes span multiple modules, including bisect.py, dev.py, functools.py, io.py, and others, consistently applying the new type hinting approach.

Changes

File	Change Summary
`.github/workflows/test.yml`	Updated matrix variable from `matrix.python` to `matrix.python-version`
`src/monty/bisect.py`	Updated `index` function type hint from `Optional[float]` to `float
`src/monty/dev.py`	Updated type hints for `deprecated` decorator parameters
`src/monty/functools.py`	Replaced `Union[list, tuple]` with `list
`src/monty/io.py`	Updated type hints for `zopen` and `reverse_readfile` functions
`src/monty/os/*`	Updated type hints using `str
`src/monty/serialization.py`	Enhanced type hints for `loadfn` and `dumpfn` functions
`src/monty/shutil.py`	Updated type hints for `compress_file` and `decompress_file`
`src/monty/string.py`	Updated `list_strings` function type hint
`src/monty/subprocess.py`	Updated `run` method type hint
`src/monty/tempfile.py`	Updated `ScratchDir` class `__init__` method type hint
`tests/test_io.py`	Added `monkeypatch` parameter to `test_warnings` method

Assessment against linked issues

Objective	Addressed	Explanation
Handling MAGMOM settings from MP API	❌	This PR does not address the MAGMOM handling issue mentioned in the linked issue

Poem

🐰 Type hints dancing light and free,
From Union to |, a syntax spree!
Modernizing code with rabbit's grace,
Clarity blooms in each type's embrace.
Monty's library, now sleek and bright! 🌟

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

codecov · 2024-12-20T09:38:18Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.18%. Comparing base (26acf0b) to head (8bc0763).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #733      +/-   ##
==========================================
- Coverage   84.19%   84.18%   -0.01%     
==========================================
  Files          27       27              
  Lines        1670     1669       -1     
  Branches      314      314              
==========================================
- Hits         1406     1405       -1     
  Misses        202      202              
  Partials       62       62

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

.github/workflows/test.yml (1)
Windows testing should be re-enabled for path handling verification

The review comment is valid. The evidence shows that:

Windows testing is explicitly commented out in the workflow (#, windows-latest)

The codebase handles path operations in monty.os module, including path manipulation and directory operations

While the test suite includes path-related tests, they are not being verified on Windows platform

The path handling code uses OS-agnostic methods (os.path), but Windows-specific edge cases should still be verified

Recommendations:

Uncomment windows-latest in the test matrix to ensure cross-platform compatibility

Consider adding Windows-specific test cases for path handling edge cases (backslashes, drive letters)

🔗 Analysis chain

Line range hint 9-10: Verify Windows compatibility

Windows testing appears to be intentionally disabled. Since the changes involve path handling in monty.os, it would be beneficial to verify Windows compatibility.

Let me check the specific Windows-related path handling in the relevant files, particularly monty.os.

Let me try a different approach to check for Windows-specific handling and tests.

Let me check for test files in a different way and look for any Windows-specific test cases.

Let me check the specific test cases for OS-related functionality and the current workflow configuration.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check for Windows-specific path handling code
rg -l 'os\.path\.sep|ntpath|\\\\|/|os\.name.*nt' src/monty/
Length of output: 467

Script:
#!/bin/bash
# Check Windows-specific path handling in monty.os
rg -A 5 'os\.name.*nt|ntpath|os\.path\.sep' src/monty/os/__init__.py

# Also check for any Windows-specific tests
fd test_ src/monty/os/ --exec rg -l 'windows|nt|os\.name'
Length of output: 130

Script:
#!/bin/bash
# Check content of monty.os for Windows handling
cat src/monty/os/__init__.py

# Look for test files in monty/os directory
fd . src/monty/os/ --type f --full-path | grep test
Length of output: 1511

Script:
#!/bin/bash
# Look for test files in the entire project related to monty.os
fd test.*\.py --exec rg -l "os\.(name|path|sep)|windows|nt" {}

# Check if there are any existing GitHub workflow files that include Windows
rg -l "windows|Windows" .github/workflows/
Length of output: 725

Script:
#!/bin/bash
# Check the OS-specific test cases
cat tests/test_os.py

# Check the current workflow configuration for Windows
cat .github/workflows/test.yml
Length of output: 4093

🧹 Nitpick comments (5)

src/monty/bisect.py (1)
Line range hint 22-31: Consider enhancing error messages

The ValueError could be more descriptive by including the value that wasn't found and the tolerance used.
-    raise ValueError
+    raise ValueError(f"Value {x} not found in list{f' within tolerance {atol}' if atol is not None else ''}")
src/monty/serialization.py (1)
47-48: Verify docstring format parameter description

The docstring format parameter description uses quotes around the literal values, which might be confusing as it differs from the type hint syntax. Consider updating for consistency.
-        fmt ("json" | "yaml" | "mpk"): If specified, the fmt specified would
+        fmt (Literal["json", "yaml", "mpk"]): If specified, the fmt specified would
Also applies to: 108-109
src/monty/shutil.py (1)
79-81: Consider adding type hints for return values

While updating the type hints, consider adding explicit return type annotations to the functions for better type safety.
def compress_file(
    filepath: str | Path,
    compression: Literal["gz", "bz2"] = "gz",
    target_dir: str | Path | None = None,
-) -> None:
+) -> None:  # explicitly document that function doesn't return anything
Also applies to: 133-135
src/monty/io.py (1)
79-85: Simplify kwargs.get() call and consider adding docstring note

The implementation correctly follows PEP 597 for encoding warnings. However, there are two suggestions:

The kwargs.get() call can be simplified

Consider documenting the PYTHONWARNDEFAULTENCODING behavior in the function's docstring
-    if (
-        "t" in mode
-        and kwargs.get("encoding", None) is None
-        and os.getenv("PYTHONWARNDEFAULTENCODING", False)
-    ):
+    if (
+        "t" in mode
+        and kwargs.get("encoding") is None
+        and os.getenv("PYTHONWARNDEFAULTENCODING", False)
+    ):
Also, consider adding this note to the docstring:
    Notes:
        When PYTHONWARNDEFAULTENCODING environment variable is set (PEP 597),
        a warning will be issued for text mode operations without explicit encoding.
🧰 Tools

🪛 Ruff (0.8.2)

83-83: Use kwargs.get("encoding") instead of kwargs.get("encoding", None)

Replace kwargs.get("encoding", None) with kwargs.get("encoding")

(SIM910)
tests/test_io.py (1)
435-461: LGTM: Well-structured test for PEP 597 encoding warnings

The test properly validates the behavior of encoding warnings with and without the PYTHONWARNDEFAULTENCODING environment variable. The warning filters are correctly configured to catch specific warning types.

Consider adding a comment explaining the test cases to improve maintainability:
    def test_warnings(self, extension, monkeypatch):
+       """Test encoding warnings based on PEP 597:
+       1. Verify warning when PYTHONWARNDEFAULTENCODING is set
+       2. Verify no warning when PYTHONWARNDEFAULTENCODING is not set
+       3. Verify implicit mode warnings
+       """
🧰 Tools

🪛 Ruff (0.8.2)

444-444: Undefined name EncodingWarning. Consider specifying requires-python = ">= 3.10" or tool.ruff.target-version = "py310" in your pyproject.toml file.

(F821)

456-456: Undefined name EncodingWarning. Consider specifying requires-python = ">= 3.10" or tool.ruff.target-version = "py310" in your pyproject.toml file.

(F821)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26acf0b and c4bedb8.

📒 Files selected for processing (13)

.github/workflows/test.yml (1 hunks)
src/monty/bisect.py (1 hunks)
src/monty/dev.py (2 hunks)
src/monty/functools.py (2 hunks)
src/monty/io.py (3 hunks)
src/monty/os/__init__.py (3 hunks)
src/monty/os/path.py (2 hunks)
src/monty/serialization.py (4 hunks)
src/monty/shutil.py (3 hunks)
src/monty/string.py (2 hunks)
src/monty/subprocess.py (1 hunks)
src/monty/tempfile.py (1 hunks)
tests/test_io.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

tests/test_io.py

444-444: Undefined name EncodingWarning. Consider specifying requires-python = ">= 3.10" or tool.ruff.target-version = "py310" in your pyproject.toml file.

(F821)

456-456: Undefined name EncodingWarning. Consider specifying requires-python = ">= 3.10" or tool.ruff.target-version = "py310" in your pyproject.toml file.

(F821)

src/monty/io.py

83-83: Use kwargs.get("encoding") instead of kwargs.get("encoding", None)

Replace kwargs.get("encoding", None) with kwargs.get("encoding")

(SIM910)

🔇 Additional comments (15)

.github/workflows/test.yml (1)

19-22: LGTM: Variable naming follows GitHub Actions best practices

The change from matrix.python to matrix.python-version aligns with GitHub Actions' conventional naming patterns, improving clarity and maintainability.

src/monty/os/__init__.py (1)

25-25: LGTM: Type hints modernized per PEP 604

The update from Union[str, Path] to str | Path aligns with PEP 604 and Python 3.10+ compatibility requirements. The changes maintain the same type safety while improving readability.

Also applies to: 45-45

src/monty/bisect.py (1)

22-22: LGTM: Type hints modernized per PEP 604

The update from Optional[float] to float | None aligns with PEP 604 and improves code readability while maintaining the same type safety.

src/monty/subprocess.py (1)

64-64: LGTM! Type hint modernization looks good.

The change from Optional[float] to float | None aligns with PEP 604's union type syntax. The implementation remains thread-safe with proper timeout handling.

src/monty/string.py (2)

10-10: LGTM! Good practice with TYPE_CHECKING import.

The import of Any under TYPE_CHECKING condition helps reduce runtime overhead.

37-37: LGTM! Type hint modernization looks good.

The change from Union[str, Iterable[str]] to str | Iterable[str] aligns with PEP 604's union type syntax. The implementation correctly handles both input types with proper type casting.

src/monty/os/path.py (2)

15-15: LGTM! Good practice with TYPE_CHECKING import.

The import of Callable and Literal under TYPE_CHECKING condition helps reduce runtime overhead.

44-46: LGTM! Type hint modernization looks good.

The changes align with PEP 604's union type syntax:

exts: Union[str, list[str]] → str | list[str]
exclude_dirs: Optional[str] → str | None
include_dirs: Optional[str] → str | None

The implementation correctly handles all input types with proper wildcard pattern matching.

src/monty/serialization.py (1)

25-25: LGTM: Type hint modernization properly implemented

The update to modern type hint syntax (using | instead of Union) and the introduction of Literal type for fmt parameter improves type safety by restricting the allowed values to specific strings.

Also applies to: 28-33

src/monty/tempfile.py (1)

42-42: LGTM: Type hint updated to modern syntax

The update from Union[str, Path, None] to str | Path | None aligns with PEP 604 while maintaining the same functionality.

src/monty/shutil.py (1)

15-15: LGTM: Type hints consistently updated

The changes properly implement modern type hint syntax across the file:

Removed unnecessary Optional import
Updated type hints from Optional[str | Path] to str | Path | None
Maintained consistency across similar parameters

Also applies to: 79-79, 133-133

src/monty/dev.py (1)

20-20: LGTM: Type hint modernization

The changes correctly implement PEP 604 union types, replacing Optional[T] with T | None and Union[T1, T2] with T1 | T2. This improves code readability while maintaining the same type safety.

Also applies to: 26-26, 28-28

src/monty/functools.py (1)

16-16: LGTM: Type hint modernization

The changes correctly implement PEP 604 union types, improving code readability while maintaining type safety.

Also applies to: 133-133

src/monty/io.py (1)

22-22: LGTM: Type hint modernization

The changes correctly implement PEP 604 union types, improving code readability while maintaining type safety.

Also applies to: 26-26, 174-174

tests/test_io.py (1)

429-429: LGTM: Warning check properly added

The FutureWarning check for LZW compression is correctly placed within the context manager block.

tests/test_io.py

DanielYang59 · 2024-12-20T10:09:01Z

@esoteric-ephemera Is current change looking good to you? i.e. use the PYTHONWARNDEFAULTENCODING environment variable from PEP 597 to enable that optional EncodingWarning which would be otherwise disabled by default.

Meanwhile I didn't find where for loadfn an encoding warning would be emitted as explicit encoding has been added, maybe you could share me a code?

monty/src/monty/serialization.py

Lines 60 to 79 in 26acf0b

    
           if fmt == "mpk": 
        
               if msgpack is None: 
        
                   raise RuntimeError( 
        
                       "Loading of message pack files is not possible as msgpack-python is not installed." 
        
                   ) 
        
               if "object_hook" not in kwargs: 
        
                   kwargs["object_hook"] = object_hook 
        
               with zopen(fn, "rb") as fp: 
        
                   return msgpack.load(fp, *args, **kwargs)  # pylint: disable=E1101 
        
           else: 
        
               with zopen(fn, "rt", encoding="utf-8") as fp: 
        
                   if fmt == "yaml": 
        
                       if YAML is None: 
        
                           raise RuntimeError("Loading of YAML files requires ruamel.yaml.") 
        
                       yaml = YAML() 
        
                       return yaml.load(fp, *args, **kwargs) 
        
                   if fmt == "json": 
        
                       if "cls" not in kwargs: 
        
                           kwargs["cls"] = MontyDecoder 
        
                       return json.load(fp, *args, **kwargs)

esoteric-ephemera · 2024-12-20T17:08:51Z

Could you set the default encoding for zopen to be utf-8 and nix the warning altogether?

DanielYang59 · 2024-12-21T05:33:34Z

Could you set the default encoding for zopen to be utf-8 and nix the warning altogether?

I believe this is almost what we're doing:

monty/src/monty/io.py

Lines 82 to 90 in 26acf0b

    
           # Warn against default `encoding` in text mode 
        
           if "t" in mode and kwargs.get("encoding", None) is None: 
        
               warnings.warn( 
        
                   "We strongly encourage explicit `encoding`, " 
        
                   "and we would use UTF-8 by default as per PEP 686", 
        
                   category=EncodingWarning, 
        
                   stacklevel=2, 
        
               ) 
        
               kwargs["encoding"] = "utf-8"

Except for after this patch, the encoding warning would only be emitted after PYTHONWARNDEFAULTENCODING is set (I believe ** this would not be disruptive as no warning would be emitted by default**).

I personally prefer to give user the option to turn on this warning:

Consistent with PEP 597 – Add optional EncodingWarning, and make it easier if users decide to find out which zopen is using a default encoding
Using utf-8 may not be backwards compatible mostly for Windows where the default is the locale encoding, so this comes with a slight risk (PEP 686 – Make UTF-8 mode default) and therefore I personally prefer to retain this option to warn
There are very few cases in pymatgen where an explicit and non-UTF-8 encoding is used, so I have to assume that there're cases where users might need other encodings (though I'm not sure that really is intended. For the ASCII case, I guess we could safely use UTF-8 instead unless that code is designed to work for ASCII chars only):
- encoding="ISO-8859-1" in qchem.outputs
- encoding="us-ascii" in bader_caller, encoding="ascii" for reading MCSQS in structure

Does this sound good to you?

DanielYang59 added 2 commits December 20, 2024 10:36

remove custom encoding warning after python 3.10+

cb6b560

check PYTHONWARNDEFAULTENCODING

412c18b

fix test workflow otherwise cannot run

3e1ffc5

DanielYang59 force-pushed the encoding-warning-follow-pep597 branch from 8fc671b to 3e1ffc5 Compare December 20, 2024 05:44

test encoding warning with env var

2cd39ce

DanielYang59 force-pushed the encoding-warning-follow-pep597 branch from e7d9c76 to 2cd39ce Compare December 20, 2024 09:36

DanielYang59 added 3 commits December 20, 2024 17:40

PEP 604, | over Union type

7f7c1f1

Literal type for fmt

f684521

implicit optional | None

c4bedb8

DanielYang59 marked this pull request as ready for review December 20, 2024 09:57

coderabbitai bot reviewed Dec 20, 2024

View reviewed changes

tests/test_io.py Show resolved Hide resolved

DanielYang59 marked this pull request as draft December 20, 2024 10:03

prefer mode as kwarg

a4d5719

fix env var check

73294c9

document default encoding

8bc0763

DanielYang59 force-pushed the encoding-warning-follow-pep597 branch from 837a6c1 to 8bc0763 Compare December 21, 2024 09:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding warning use PEP 597 env var `PYTHONWARNDEFAULTENCODING` #733

Encoding warning use PEP 597 env var `PYTHONWARNDEFAULTENCODING` #733

DanielYang59 commented Dec 20, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 20, 2024 •

edited

Loading

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

codecov bot commented Dec 20, 2024 •

edited

Loading

coderabbitai bot left a comment

DanielYang59 commented Dec 20, 2024

esoteric-ephemera commented Dec 20, 2024

DanielYang59 commented Dec 21, 2024 •

edited

Loading

Encoding warning use PEP 597 env var PYTHONWARNDEFAULTENCODING #733

Are you sure you want to change the base?

Encoding warning use PEP 597 env var PYTHONWARNDEFAULTENCODING #733

Conversation

DanielYang59 commented Dec 20, 2024 • edited by coderabbitai bot Loading

Summary

Summary by CodeRabbit

coderabbitai bot commented Dec 20, 2024 • edited Loading

Review skipped

Walkthrough

Changes

Assessment against linked issues

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

codecov bot commented Dec 20, 2024 • edited Loading

Codecov Report

coderabbitai bot left a comment

Choose a reason for hiding this comment

DanielYang59 commented Dec 20, 2024

esoteric-ephemera commented Dec 20, 2024

DanielYang59 commented Dec 21, 2024 • edited Loading

Encoding warning use PEP 597 env var `PYTHONWARNDEFAULTENCODING` #733

Encoding warning use PEP 597 env var `PYTHONWARNDEFAULTENCODING` #733

DanielYang59 commented Dec 20, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 20, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Dec 20, 2024 •

edited

Loading

DanielYang59 commented Dec 21, 2024 •

edited

Loading