-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pt compress commad line #4300
Conversation
Caution Review failedThe pull request is closed. 📝 Walkthrough📝 WalkthroughWalkthroughThe changes introduce a new command option Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
deepmd/pt/entrypoints/main.py
(3 hunks)
🔇 Additional comments (2)
deepmd/pt/entrypoints/main.py (2)
46-46
: LGTM!
The import is correctly placed and properly used in the new compression functionality.
Line range hint 505-595
: Verify test coverage and CLI documentation.
Please ensure:
- The new compression functionality is covered by unit tests
- The CLI help documentation includes the new command and its parameters
✅ Verification successful
Tests and CLI documentation are properly implemented.
Based on the verification results:
-
Test coverage:
- The compression functionality is well-tested across multiple test files
test_compressed_descriptor_se_t.py
shows actual usage ofenable_compression
- Other test files cover various compression scenarios
-
CLI documentation and parsing:
- The compress command is properly integrated into the argument parser
- Test file
test_argument_parser.py
verifies the CLI arguments:--output
--extrapolate
--step
--frequency
--checkpoint-folder
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for test coverage
rg -l "enable_compression" --type py --glob "*test*.py"
# Check for CLI documentation
rg -l "compress" --type py --glob "*argparse*.py"
Length of output: 552
Script:
#!/bin/bash
# Check for CLI documentation in main.py and other potential CLI-related files
rg -l "compress" --type py
# Check the actual test coverage by looking at test content
rg "enable_compression" -C 5 --type py source/tests/pt/model/test_compressed_descriptor_se_t.py
# Check CLI argument parsing
rg "compress" -C 5 deepmd/pt/entrypoints/main.py
Length of output: 3699
Script:
#!/bin/bash
# Check for CLI argument parsing for compress command
rg -C 10 "compress" source/tests/common/test_argument_parser.py
# Check for the argument parser setup in main code
rg -C 10 "add_subparser" deepmd/pt/entrypoints/main.py
Length of output: 2442
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #4300 +/- ##
==========================================
+ Coverage 84.50% 84.53% +0.02%
==========================================
Files 570 571 +1
Lines 53162 53163 +1
Branches 3059 3059
==========================================
+ Hits 44927 44943 +16
+ Misses 7272 7259 -13
+ Partials 963 961 -2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (3)
deepmd/pt/entrypoints/compress.py (2)
14-20
: Add docstring to document function parameters and behavior.The function would benefit from a docstring explaining:
- Purpose of the function
- Description of each parameter
- Expected format of input/output files
- Return value (if any)
- Possible exceptions
Example docstring:
def enable_compression( input_file: str, output: str, stride: float = 0.01, extrapolate: int = 5, check_frequency: int = -1, ): """Enable compression for PyTorch models. Args: input_file: Path to input model file (.pth or .pt) output: Path to save compressed model stride: Compression stride parameter (default: 0.01) extrapolate: Number of extrapolation steps (default: 5) check_frequency: Frequency for compression checks, -1 to disable (default: -1) Raises: ValueError: If input file format is not supported """
35-36
: Improve error message for unsupported formats.The error message could be more helpful by explicitly listing the supported extensions.
- raise ValueError("PyTorch backend only supports converting .pth or .pt file") + raise ValueError(f"Unsupported file format for {input_file}. PyTorch backend only supports .pth or .pt files")deepmd/main.py (1)
441-450
: LGTM! Clear argument descriptions with backend-specific details.The updated input/output arguments now handle both backends elegantly by:
- Using suffix-less default values
- Clearly documenting the expected suffixes for each backend
Consider adding validation to ensure the input/output file extensions match the selected backend:
parser_compress.add_argument( "-i", "--input", default="frozen_model", type=str, + help="The original frozen model, which will be compressed by the code. " + "Filename (prefix) of the input model file. " + "TensorFlow backend: suffix is .pb; PyTorch backend: suffix is .pth" - help="The original frozen model, which will be compressed by the code. Filename (prefix) of the input model file. TensorFlow backend: suffix is .pb; PyTorch backend: suffix is .pth", )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
deepmd/main.py
(1 hunks)deepmd/pt/entrypoints/compress.py
(1 hunks)deepmd/pt/entrypoints/main.py
(2 hunks)
🔇 Additional comments (4)
deepmd/pt/entrypoints/compress.py (2)
1-12
: LGTM! Well-organized imports.
The imports are properly organized and all dependencies are relevant for the implementation.
1-46
: Verify integration with model compression functionality.
Let's verify that:
- The model compression implementation exists
- The command-line integration is properly set up
✅ Verification successful
Model compression functionality is properly integrated
The verification shows that:
- Model compression is well-implemented across both PyTorch and TensorFlow backends:
- PyTorch descriptors (se_a, se_r, se_t, se_atten) have
enable_compression
methods - Comprehensive test coverage exists with multiple compression test files
- PyTorch descriptors (se_a, se_r, se_t, se_atten) have
- Command-line integration is complete:
- The compress command is properly integrated in main.py
- The implementation handles both .pth and .pt file formats
- Parameters (stride, extrapolate, check_frequency) are correctly passed through
The test implementation demonstrates proper compression verification with numerical accuracy checks, confirming the feature works as expected.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if model compression is implemented
echo "Checking for model compression implementation..."
ast-grep --pattern 'def enable_compression($$$)' || echo "Warning: No enable_compression method found in models"
# Check command-line integration
echo "Checking command-line integration..."
rg -l "compress" "deepmd/pt/entrypoints/main.py" || echo "Warning: compress command might not be integrated in main.py"
# Check for compression tests
echo "Checking for compression tests..."
fd -g "*test*compress*.py" || echo "Warning: No compression tests found"
Length of output: 1148
Script:
#!/bin/bash
# Let's check the implementation of enable_compression in model classes
echo "Checking model compression implementation details..."
rg -A 5 "def enable_compression"
# Check main.py for command line integration details
echo "Checking command line integration details..."
rg -A 10 "compress" "deepmd/pt/entrypoints/main.py"
# Check one of the PyTorch compression tests
echo "Checking PyTorch compression test implementation..."
cat "source/tests/pt/model/test_compressed_descriptor_se_a.py"
Length of output: 12511
deepmd/pt/entrypoints/main.py (1)
41-43
: LGTM!
The import follows the established pattern and is properly grouped with other entrypoint imports.
deepmd/main.py (1)
427-434
: LGTM! Clear and helpful command description.
The updated help message and examples now correctly reflect the multi-backend support for the compress command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (8)
source/tests/pt/model_compression/input.json (2)
8-27
: Document the reasoning behind descriptor parameters.The descriptor configuration contains several parameters that would benefit from additional documentation:
- The
sel
values [46,92] appear to be magic numbers- The
neuron
array [4,8,17,17] configuration needs explanation- The significance of
rcut=6.00
andrcut_smth=0.50
should be documented- The choice of
axis_neuron=16
needs clarificationConsider adding comments explaining the reasoning and impact of these parameter choices.
1-85
: Improve configuration documentation structure.The configuration file would benefit from the following improvements:
- Replace generic
_commentN
fields with descriptive names (e.g.,_comment_model
,_comment_training
)- Add a header comment describing the purpose and usage of this configuration file
- Include documentation about expected value ranges and impacts for key parameters
Example improvement for the header:
{ + "_comment_file": "Configuration for model compression testing with se_e2_a descriptor type.", + "_comment_usage": "Used by the enable_compression function for .pth and .pb model compression", "_comment1": " model parameters",source/tests/pt/common.py (1)
23-24
: Add docstring to document the j_loader function.While the implementation is clean, adding a docstring would improve maintainability by documenting:
- Purpose of the function
- Parameter description
- Return value description
- Usage example
Here's a suggested improvement:
def j_loader(filename): + """Load JSON data from a file relative to the tests directory. + + Parameters + ---------- + filename : str + Name of the file to load, relative to the tests directory + + Returns + ------- + dict + Loaded JSON data + + Example + ------- + >>> data = j_loader("test_data.json") + """ return dp_j_loader(tests_path / filename)source/tests/pt/test_model_compression_se_a.py (5)
15-15
: Remove commented-out import statementThe import statement on line 15 is commented out and appears to be unnecessary. Consider removing it to clean up the code.
Apply this diff:
- # from deepmd.tf.entrypoints.compress import compress
22-25
: Simplify conditional assignment using a ternary operatorFor better readability and conciseness, replace the
if
-else
block with a ternary operator.Apply this diff:
- if GLOBAL_NP_FLOAT_PRECISION == np.float32: - default_places = 4 - else: - default_places = 10 + default_places = 4 if GLOBAL_NP_FLOAT_PRECISION == np.float32 else 10🧰 Tools
🪛 Ruff
22-25: Use ternary operator
default_places = 4 if GLOBAL_NP_FLOAT_PRECISION == np.float32 else 10
instead ofif
-else
-blockReplace
if
-else
-block withdefault_places = 4 if GLOBAL_NP_FLOAT_PRECISION == np.float32 else 10
(SIM108)
22-22: Yoda condition detected
Rewrite as
np.float32 == GLOBAL_NP_FLOAT_PRECISION
(SIM300)
28-33
: Useshutil.rmtree
for directory removalThe
_file_delete
function usesos.rmdir
, which only removes empty directories. If the directories may contain files, consider usingshutil.rmtree
for recursive deletion.Apply this diff:
+ import shutil def _file_delete(file): if os.path.isdir(file): - os.rmdir(file) + shutil.rmtree(file) elif os.path.isfile(file): os.remove(file)
80-87
: Refactorglobal
statement for readabilityUsing backslashes for line continuation is discouraged. Refactor the
global
statement to improve readability.Apply this diff:
- global \ - INPUT, \ - FROZEN_MODEL, \ - COMPRESSED_MODEL, \ - INPUT_ET, \ - FROZEN_MODEL_ET, \ - COMPRESSED_MODEL_ET + global INPUT, FROZEN_MODEL, COMPRESSED_MODEL, \ + INPUT_ET, FROZEN_MODEL_ET, COMPRESSED_MODEL_ET
427-427
: Remove unused variablenframes
The variable
nframes
is assigned but never used. Removing it will eliminate unnecessary code.Apply this diff:
- nframes = 1
🧰 Tools
🪛 Ruff
427-427: Local variable
nframes
is assigned to but never usedRemove assignment to unused variable
nframes
(F841)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
source/tests/pt/common.py
(3 hunks)source/tests/pt/model_compression/data/type.raw
(1 hunks)source/tests/pt/model_compression/data/type_map.raw
(1 hunks)source/tests/pt/model_compression/input.json
(1 hunks)source/tests/pt/test_model_compression_se_a.py
(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- source/tests/pt/model_compression/data/type.raw
- source/tests/pt/model_compression/data/type_map.raw
🧰 Additional context used
🪛 Ruff
source/tests/pt/test_model_compression_se_a.py
22-25: Use ternary operator default_places = 4 if GLOBAL_NP_FLOAT_PRECISION == np.float32 else 10
instead of if
-else
-block
Replace if
-else
-block with default_places = 4 if GLOBAL_NP_FLOAT_PRECISION == np.float32 else 10
(SIM108)
22-22: Yoda condition detected
Rewrite as np.float32 == GLOBAL_NP_FLOAT_PRECISION
(SIM300)
427-427: Local variable nframes
is assigned to but never used
Remove assignment to unused variable nframes
(F841)
🔇 Additional comments (6)
source/tests/pt/model_compression/input.json (2)
49-58
: Validate loss preference transitions.
The loss configuration shows significant transitions in preferences:
- Force preference transitions from 1000 to 1
- Energy preference transitions from 0.02 to 1
- Volume preference is disabled (0)
Please ensure these transitions have been validated for model stability and convergence.
Consider documenting the rationale for these specific preference values and transitions in the comments.
41-47
: Verify learning rate decay parameters.
The learning rate configuration shows a significant drop from start_lr (0.001) to stop_lr (3.51e-8). Please verify:
- Whether this steep decay over 5000 steps is intentional
- If these values have been tested for model stability
✅ Verification successful
The previous search was too specific. Let's broaden it to find any learning rate configurations and documentation to better understand the typical patterns used in this codebase.
The learning rate configuration is consistent with project standards
The search results show that this exact learning rate configuration (start_lr: 0.001, stop_lr: 3.51e-8, decay_steps: 5000) is widely used across multiple test and example configurations in the codebase, particularly in water-related models. The documentation in doc/troubleshooting/precision.md
confirms this is a recommended configuration pattern, stating "The learning rate from the examples is a good choice to start."
The steep decay is intentional and follows the project's exponential decay formula documented in doc/train/training-advanced.md
:
lr(t) = start_lr * decay_rate ^ (t / decay_steps)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for similar learning rate configurations in other test files
rg -g "*.json" --type json "start_lr.*0\.001.*stop_lr.*3\.51e-8" -C 5
Length of output: 71
Script:
#!/bin/bash
# Search for any learning rate configurations in json files to compare patterns
rg -g "*.json" --type json "\"learning_rate\"" -A 8
# Also check for any documentation or comments about learning rate configuration
rg -g "*.md" -g "*.txt" "learning.?rate|lr" -i
Length of output: 48087
source/tests/pt/common.py (2)
2-2
: LGTM! Clean import organization and path handling.
The addition of pathlib and the setup of tests_path using Path.absolute() follows best practices for cross-platform path handling.
Also applies to: 11-11, 20-20
23-24
: Verify j_loader usage in compression tests.
Since this function is intended to support the new model compression functionality, let's verify its usage in the test files.
✅ Verification successful
j_loader is correctly integrated with compression functionality
The verification confirms that j_loader is properly used in the model compression tests:
- It loads the compression configuration from "model_compression/input.json"
- The loaded configuration contains all necessary parameters for model compression (descriptor, fitting_net, etc.)
- The function is used in both regular and exclude-types compression test scenarios
- The loaded data is properly modified and used to initialize the compression test environment
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check how j_loader is used in compression-related test files
# Search for j_loader usage in test files
echo "Searching for j_loader usage in test files..."
rg "j_loader" "source/tests/pt/"
# Search specifically in compression test files
echo "Searching in compression-specific test files..."
fd "test.*compress.*\.py" "source/tests/pt/" --exec rg "j_loader" {}
Length of output: 1024
Script:
#!/bin/bash
# Let's examine the content of the input.json and how j_loader is used in the compression test
# Check the content of input.json
echo "Content of input.json:"
cat "source/tests/pt/model_compression/input.json" || echo "File not found"
# Check the context of j_loader usage in compression test
echo -e "\nContext of j_loader usage in compression test:"
rg -B 5 -A 5 "j_loader" "source/tests/pt/test_model_compression_se_a.py"
Length of output: 3212
source/tests/pt/test_model_compression_se_a.py (2)
154-156
: Ensure appropriate precision in assertions
When using np.testing.assert_almost_equal
, verify that default_places
provides sufficient precision for your tests, especially if GLOBAL_NP_FLOAT_PRECISION
affects numerical accuracy.
403-410
: Verify the import path for DP
calculator
You're importing DP
from deepmd.tf.calculator
, but since the tests are for the PyTorch implementation, ensure this is the correct module.
Run the following script to check if DP
is available in deepmd.pt.calculator
:
If the import is successful, consider updating the import statement:
- from deepmd.tf.calculator import (
+ from deepmd.pt.calculator import (
DP,
)
It seems that |
We set the default device to a virtual one. It should be never used.
But it seems that this case is not touched before. I'll take a look. |
Because we just need to test whether |
|
Isn't EnergyModel a subclass of BaseModel? |
Sorry I don't seem to have updated the latest version of the code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am satisfied with the current state of this PR.
Wait for #4326 to be merged. |
OK, I'm troubled by the current error. |
I think we need to update dpa1.pth |
Signed-off-by: Jinzhe Zeng <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
source/tests/pt/model/test_unused_params.py (1)
89-90
: Enhance test documentation and verification for non-contributing parameters.While the change to expect 2 non-contributing parameters aligns with the addition of compression functionality, the test would benefit from:
- Better documentation explaining which specific parameters are expected to be non-contributing
- Additional assertions to verify the identity of these parameters
Consider adding assertions like:
# 2 for compression self.assertEqual(len(non_contributing), 2) +# Verify the specific compression-related parameters that should be non-contributing +non_contributing_names = {param.name for param in non_contributing} +expected_names = {'param1_name', 'param2_name'} # Replace with actual parameter names +self.assertEqual(non_contributing_names, expected_names, + "Unexpected non-contributing parameters found")source/tests/pt/model/test_descriptor_dpa1.py (1)
248-252
: LGTM! Consider adding more detailed comments.The changes appropriately handle backward compatibility for older state dictionaries by manually adding compression-related states.
Consider expanding the comment to explain:
- Why this manual modification is necessary
- What versions of state dictionaries require this handling
- When this compatibility code can be removed
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
source/tests/pt/model/test_descriptor_dpa1.py
(2 hunks)source/tests/pt/model/test_descriptor_dpa2.py
(1 hunks)source/tests/pt/model/test_unused_params.py
(1 hunks)
🔇 Additional comments (2)
source/tests/pt/model/test_descriptor_dpa2.py (1)
197-198
:
Verify compression data handling requirements.
The code marks compression-related keys as processed without setting their values. This could lead to undefined behavior if these keys are required for the compression functionality.
Let's verify the compression key requirements:
Consider one of these approaches:
- Initialize these keys with appropriate default values if they're required
- Add documentation explaining why these keys can be safely marked as processed without values
- Add assertions to verify these keys aren't required in the current context
source/tests/pt/model/test_descriptor_dpa1.py (1)
378-379
: LGTM! Consider adding key existence verification.
The changes correctly mark compression-related keys as present in the record list, maintaining consistency with the state dictionary modifications.
Let's verify that these compression-related keys are actually present in the target dictionary:
✅ Verification successful
Based on the search results, I can now generate the final response:
Compression-related keys are properly defined and used consistently
The changes correctly mark compression-related keys as present in the record list. The verification shows that:
compress_data
andcompress_info
are properly defined asParameterList
attributes in the SE descriptor classes- These keys are consistently used across the codebase in both DPA1 and DPA2 test descriptors
- The state dictionary handling in
test_descriptor_dpa1.py
properly manages these compression attributes
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search for compression-related attributes in the codebase
# to ensure they are properly defined in the relevant classes.
# Test: Look for compression attribute definitions
rg -A 5 'compress_(info|data)'
# Test: Look for compression-related state dict keys
rg -A 5 'compress.*["\'].*state_dict'
Length of output: 10885
Signed-off-by: Jinzhe Zeng <[email protected]>
Summary by CodeRabbit
New Features
"compress"
to trigger model compression."compress"
command to clarify usage with different backends.Bug Fixes
Tests