-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update: test #75
update: test #75
Conversation
WalkthroughThis pull request introduces several changes across multiple files. The Changes
Possibly related PRs
Poem
Warning Rate limit exceeded@zprobot has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 18 minutes and 11 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (7)
tests/common.py (1)
1-4
: Consider adding directory existence validationThe path constants are well-defined, but it would be beneficial to validate that these directories exist at runtime to fail fast with a clear error message.
from pathlib import Path TEST_DATA_ROOT = Path(__file__).parent / "example" DATA_ROOT = Path(__file__).parent.parent / "data" + +# Validate directory existence +if not TEST_DATA_ROOT.is_dir(): + raise FileNotFoundError(f"Test data directory not found: {TEST_DATA_ROOT}") +if not DATA_ROOT.is_dir(): + raise FileNotFoundError(f"Data directory not found: {DATA_ROOT}")tests/test_ibaqpy.py (3)
10-17
: Standardize path handling approachWhile most paths use the
datafile()
function,remove_ids
uses string concatenation withDATA_ROOT
. Consider usingdatafile()
consistently for all test data paths.- "remove_ids": str(DATA_ROOT) + "/contaminants_ids.tsv", + "remove_ids": datafile("contaminants_ids.tsv"),🧰 Tools
🪛 GitHub Check: CodeFactor
[notice] 10-10: tests/test_ibaqpy.py#L10
Multiple spaces after ':'. (E241)
24-24
: Remove debug print statementThe
print(__package__)
statement appears to be a debugging leftover and should be removed.- print(__package__)
10-10
: Fix spacing after colonRemove extra space after the colon to maintain consistent formatting.
- "parquet": datafile("feature.parquet"), + "parquet": datafile("feature.parquet"),🧰 Tools
🪛 GitHub Check: CodeFactor
[notice] 10-10: tests/test_ibaqpy.py#L10
Multiple spaces after ':'. (E241)ibaqpy/ibaq/peptide_normalization.py (3)
339-345
: Enhance error handling with more specific error messagesThe error handling could be more informative by including the actual error details in the exit message.
- exit( - "Some errors occurred when parsing pg_accessions column in feature parquet!" - ) + exit( + f"Error parsing pg_accessions column in feature parquet: {str(e)}" + )
Line range hint
4-4
: Add tests for the peptide normalization functionalityThe TODO comment indicates missing tests. Given the critical nature of this code in proteomics data analysis, comprehensive tests are essential.
Would you like me to help:
- Generate a test suite structure?
- Create a GitHub issue to track this task?
Confirmed: Inconsistent label detection logic between duplicate methods
The codebase contains two implementations of
get_label
with different behaviors:
- Standalone function (line 97): Sets
label = "LFQ"
iflen(labels) == 1
without checking the content- Feature class method (line 361): Sets
label = "LFQ"
only iflen(labels) == 1
AND contains "LABEL FREE" (case insensitive)The standalone function is called in two places:
- Line 3: Direct call to
get_label(labels)
- Line 374: Through
Feature.experimental_inference()
which calls the standalone versionThis inconsistency could cause incorrect label type detection depending on which method is used.
Consider:
- Removing the standalone function and using the Feature class's static method everywhere
- Or updating both implementations to use the same logic for label detection
🔗 Analysis chain
Line range hint
64-65
: Inconsistency between duplicate label detection methodsThere are two implementations of the
get_label
function with different logic:
- The standalone function doesn't check for "LABEL FREE"
- The Feature class's static method includes this check
This inconsistency could lead to different behavior depending on which method is called.
Consider:
- Removing the standalone function and using the Feature class's static method
- Or updating both implementations to match
Let's verify the usage:
Also applies to: 374-375
Let me verify the implementations of both
get_label
functions to confirm the inconsistency.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Find all calls to get_label rg "get_label\(" --context 2Length of output: 1487
Script:
#!/bin/bash # Get the implementation of both get_label functions ast-grep --pattern 'def get_label(labels: list) -> (str, dict): $$$'Length of output: 5556
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (4)
tests/PXD003947/PXD003947-feature.parquet
is excluded by!**/*.parquet
tests/PXD003947/PXD003947.sdrf.tsv
is excluded by!**/*.tsv
tests/example/PXD017834-TMT.sdrf.tsv
is excluded by!**/*.tsv
tests/example/feature.parquet
is excluded by!**/*.parquet
📒 Files selected for processing (5)
.gitignore
(1 hunks)ibaqpy/ibaq/peptide_normalization.py
(1 hunks)ibaqpy/ibaq/utils.py
(1 hunks)tests/common.py
(1 hunks)tests/test_ibaqpy.py
(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- .gitignore
- ibaqpy/ibaq/utils.py
🧰 Additional context used
🪛 GitHub Check: CodeFactor
tests/test_ibaqpy.py
[notice] 10-10: tests/test_ibaqpy.py#L10
Multiple spaces after ':'. (E241)
🔇 Additional comments (3)
tests/common.py (1)
6-9
: Verify the path handling logic across the codebase
The function removes 'example/' prefix from paths, which might indicate inconsistent path handling across the test suite.
✅ Verification successful
The previous results show only a few examples of path usage. Let's get a more comprehensive view of how paths are handled across the test suite to verify if the path handling is consistent.
Path handling is consistent across the test suite
The path handling in the datafile
function is consistent with its usage across the codebase:
- The function is used exclusively in test files for accessing test data
- The
example/
prefix removal is intentional asTEST_DATA_ROOT
already includes the "example" directory - All test data file references through
datafile()
follow a consistent pattern, either with or without the "example/" prefix - The main application code uses proper path handling with
Path
,click.Path
, and standard path operations
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check how test data paths are constructed across the test suite
# to ensure consistent usage of the datafile function
# Look for direct string paths containing 'example/'
echo "Files containing hardcoded 'example/' paths:"
rg "example/" --type py
# Look for test data path construction patterns
echo -e "\nOther test data path construction patterns:"
rg "Path\(.*test.*data" --type py
Length of output: 615
Script:
#!/bin/bash
# Let's check all usages of datafile function and path construction patterns
echo "All datafile function usages:"
rg "datafile\(" -A 2 --type py
echo -e "\nAll path-related operations in test files:"
rg "path|Path" --type py -A 2 | grep -v "from pathlib"
Length of output: 6929
tests/test_ibaqpy.py (1)
5-5
: LGTM: Clean import of new path utilities
The addition of path utilities from common module aligns well with the PR's objective of improving test data path management.
ibaqpy/ibaq/peptide_normalization.py (1)
330-331
: Verify the column name change impact
The SQL query now uses "pg_accessions" instead of "protein_accessions". Let's verify this change doesn't break existing data.
✅ Verification successful
Column name change is consistent with codebase design
The change from "protein_accessions" to "pg_accessions" is intentional and properly handled:
- "pg_accessions" is the standardized column name in parquet files as defined in
ibaqpy/ibaq/ibaqpy_commons.py
- The mapping between "pg_accessions" and "PROTEIN_NAME" is explicitly defined in
parquet_map
- The usage of "protein_accessions" in
compute_ibaq.py
is for a different purpose (FASTA processing) and doesn't interact with the parquet database operations
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if any files still reference the old column name
rg -l "protein_accessions"
# Check the schema of existing parquet files
fd -e parquet -x python3 -c "import pyarrow.parquet as pq; print('Schema for {}:'.format('{}'), pq.read_schema('{}'))"
Length of output: 232
Script:
#!/bin/bash
# Let's check the content of the file that still uses protein_accessions
rg "protein_accessions" ibaqpy/ibaq/compute_ibaq.py -A 2 -B 2
# Also check for pg_accessions usage across the codebase
rg "pg_accessions" -A 2 -B 2
Length of output: 3124
tests/common.py
Outdated
def datafile(path: str): | ||
path = str(path).removeprefix("/").removeprefix("example/") | ||
print(TEST_DATA_ROOT) | ||
return str(TEST_DATA_ROOT / path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove debug print statement
The print statement appears to be debugging code that shouldn't be in production.
def datafile(path: str):
path = str(path).removeprefix("/").removeprefix("example/")
- print(TEST_DATA_ROOT)
return str(TEST_DATA_ROOT / path)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def datafile(path: str): | |
path = str(path).removeprefix("/").removeprefix("example/") | |
print(TEST_DATA_ROOT) | |
return str(TEST_DATA_ROOT / path) | |
def datafile(path: str): | |
path = str(path).removeprefix("/").removeprefix("example/") | |
return str(TEST_DATA_ROOT / path) |
🛠️ Refactor suggestion
Enhance function documentation and type hints
The function would benefit from proper documentation and complete type hints.
-def datafile(path: str):
+def datafile(path: str) -> str:
+ """
+ Constructs a test data file path relative to TEST_DATA_ROOT.
+
+ Args:
+ path: A path string, optionally starting with '/' or 'example/'
+
+ Returns:
+ Absolute path to the test data file as a string
+
+ Raises:
+ ValueError: If the path is empty or None
+ """
+ if not path:
+ raise ValueError("Path cannot be empty or None")
path = str(path).removeprefix("/").removeprefix("example/")
return str(TEST_DATA_ROOT / path)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def datafile(path: str): | |
path = str(path).removeprefix("/").removeprefix("example/") | |
print(TEST_DATA_ROOT) | |
return str(TEST_DATA_ROOT / path) | |
def datafile(path: str) -> str: | |
""" | |
Constructs a test data file path relative to TEST_DATA_ROOT. | |
Args: | |
path: A path string, optionally starting with '/' or 'example/' | |
Returns: | |
Absolute path to the test data file as a string | |
Raises: | |
ValueError: If the path is empty or None | |
""" | |
if not path: | |
raise ValueError("Path cannot be empty or None") | |
path = str(path).removeprefix("/").removeprefix("example/") | |
return str(TEST_DATA_ROOT / path) |
tests/test_ibaqpy.py
Outdated
"fasta": datafile("Homo-sapiens-uniprot-reviewed-contaminants-decoy-202210.fasta"), | ||
"peptides": datafile("example/PXD017834-peptides-norm.csv"), | ||
"enzyme": "Trypsin", | ||
"normalize": True, | ||
"min_aa": 7, | ||
"max_aa": 30, | ||
"output": __package__ + "PXD003947/PXD003947-ibaq-norm.csv", | ||
"output": datafile("example/PXD017834-ibaq-norm.csv"), | ||
"verbose": True, | ||
"qc_report": __package__ + "PXD003947/IBAQ-QCprofile.pdf", | ||
"qc_report": datafile("example/IBAQ-QCprofile.pdf"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Missing example data files required for tests
The verification shows that only the FASTA file exists at tests/example/Homo-sapiens-uniprot-reviewed-contaminants-decoy-202210.fasta
. The following required test files are missing:
example/PXD017834-peptides-norm.csv
example/PXD017834-ibaq-norm.csv
example/IBAQ-QCprofile.pdf
🔗 Analysis chain
Verify existence of example data files
The test references several files in the example directory. Let's verify these files exist in the test data directory.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if all referenced test files exist
files=(
"Homo-sapiens-uniprot-reviewed-contaminants-decoy-202210.fasta"
"example/PXD017834-peptides-norm.csv"
"example/PXD017834-ibaq-norm.csv"
"example/IBAQ-QCprofile.pdf"
)
for file in "${files[@]}"; do
fd -t f "$file"
done
Length of output: 559
f_table["pg_accessions"] = f_table["pg_accessions"].apply( | ||
lambda x: x[0] | ||
) | ||
except Exception as e: | ||
print(e) | ||
exit( | ||
"Some errors occurred when parsing protein_accessions column in feature parquet!" | ||
"Some errors occurred when parsing pg_accessions column in feature parquet!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Standardize error handling across the codebase
The error handling is inconsistent:
- Some errors use print + exit
- Others raise exceptions (e.g., FileExistsError)
This inconsistency makes error handling unpredictable for users of this code.
Consider standardizing error handling by:
- Using custom exceptions
- Letting callers handle the errors
Example:
+class PeptideNormalizationError(Exception):
+ """Base exception for peptide normalization errors"""
+ pass
+
+class ParsingError(PeptideNormalizationError):
+ """Raised when parsing fails"""
+ pass
- exit(
- f"Some errors occurred when parsing pg_accessions column in feature parquet!"
- )
+ raise ParsingError(f"Failed to parse pg_accessions column: {str(e)}")
Also applies to: 466-469, 566-567
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
.github/workflows/python-app.yml (1)
39-39
: Consider modernizing the package installation approachWhile adding package installation before tests is correct, there are several improvements we could make to align with current best practices:
- Use
pip install .
instead ofsetup.py install
(which is deprecated)- Add dependency caching to speed up workflows
- Consider publishing test results
Here's a suggested improvement:
- python setup.py install - python -m unittest + # Cache dependencies + - uses: actions/cache@v3 + with: + path: ~/.cache/pip + key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }} + restore-keys: | + ${{ runner.os }}-pip- + + # Install package + pip install -e . + + # Run tests with result publishing + python -m unittest --verbose 2>&1 | tee test_output.txt + + # Optional: Publish test results + - name: Publish Test Results + uses: EnricoMi/publish-unit-test-result-action@v2 + if: always() + with: + files: test_output.txt.github/workflows/python-package.yml (3)
40-40
: Consider using modern package installation approachWhile adding the installation step is correct,
python setup.py install
is deprecated. Consider using the more modern approach:- python setup.py install + pip install -e .
49-49
: Add newline at end of fileAdd a newline at the end of the file to comply with YAML best practices.
--remove_low_frequency_peptides --output tests/example/PXD017834-peptides-norm.csv --skip_normalization --nmethod median --pnmethod max_min --log2 --save_parquet +
🧰 Tools
🪛 yamllint (1.29.0-1)
[error] 49-49: no new line character at the end of file
(new-line-at-end-of-file)
Line range hint
15-17
: Consider workflow improvements for better efficiency and compatibilityThe current workflow configuration could benefit from several improvements:
- Add more Python versions (e.g., 3.9, 3.10) to ensure compatibility
- Implement dependency caching to speed up CI runs
- Add timeout settings to prevent hung jobs
Example improvement:
strategy: fail-fast: false matrix: - python-version: ["3.8"] + python-version: ["3.8", "3.9", "3.10"] + timeout-minutes: 10 + - name: Cache pip packages + uses: actions/cache@v3 + with: + path: ~/.cache/pip + key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }} + restore-keys: | + ${{ runner.os }}-pip-🧰 Tools
🪛 yamllint (1.29.0-1)
[error] 49-49: no new line character at the end of file
(new-line-at-end-of-file)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
.github/workflows/python-app.yml
(1 hunks).github/workflows/python-package.yml
(1 hunks)
🧰 Additional context used
🪛 yamllint (1.29.0-1)
.github/workflows/python-package.yml
[error] 49-49: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (1)
.github/workflows/python-package.yml (1)
47-49
: Verify test data paths and consider safer cleanup
Let's verify the existence of these test files and consider using Python's test cleanup mechanisms.
Consider using pytest fixtures or unittest's setUp/tearDown methods for test cleanup instead of shell commands. This would be more portable and safer:
# In tests/test_ibaqpy.py
def setUp(self):
# Clean up any existing output files
for file in ['PXD017834-ibaq-norm.csv', 'PXD017834-peptides-norm.csv']:
path = os.path.join('tests', 'example', file)
if os.path.exists(path):
os.remove(path)
✅ Verification successful
Test file cleanup is appropriate and paths are verified
The cleanup commands are correctly removing generated test output files:
tests/example/
directory existstests/example/feature.parquet
andtests/example/PXD017834-TMT.sdrf.tsv
are input filesPXD017834-peptides-norm.csv
andPXD017834-ibaq-norm.csv
are output files generated during test execution- The cleanup ensures consistent test results by removing previous outputs
While using test fixtures would be cleaner, the current cleanup approach is functional and safe for these specific files.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the existence of test data paths and related files
# Check if the test directory structure exists
fd -t d "example" tests/
# Check for related test files
fd -t f "PXD017834" tests/example/
Length of output: 111
Script:
#!/bin/bash
# Let's check if there are any test files that handle the cleanup
rg -l "setUp|tearDown|fixture" tests/
Length of output: 39
Script:
#!/bin/bash
# Check the content of the test files to see how they handle cleanup
ast-grep --pattern 'def test_$_($$$): $$$'
Length of output: 44
Script:
#!/bin/bash
# Check if these files are generated during tests
rg -l "PXD017834-(ibaq|peptides)-norm\.csv" tests/
Length of output: 73
Script:
#!/bin/bash
# Let's check the content of test_ibaqpy.py to understand how these files are used
rg -A 5 "PXD017834-(ibaq|peptides)-norm\.csv" tests/test_ibaqpy.py
Length of output: 711
Script:
#!/bin/bash
# Check if these are output files from test execution
fd -t f "feature.parquet" tests/example/
Length of output: 70
🧰 Tools
🪛 yamllint (1.29.0-1)
[error] 49-49: no new line character at the end of file
(new-line-at-end-of-file)
Summary by CodeRabbit
Release Notes
New Features
.gitignore
to include additional ignored files and directories.Improvements
pycombat
function for improved utility.Bug Fixes