-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
default to conda over container #182
Conversation
📝 WalkthroughWalkthroughThis pull request introduces a comprehensive update to the Harpy project's command-line interface, replacing the Changes
Possibly related PRs
Suggested Labels
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
🔭 Outside diff range comments (1)
harpy/impute.py (1)
Line range hint
1-1
: Systematic inconsistency with PR objective across all files.The PR title suggests defaulting to conda over container, but the implementation across all files shows:
- Help texts suggesting container as an alternative to conda
- Logic defaulting to conda instead of container
This systematic inconsistency needs to be addressed by either:
- Updating the PR title to match the current implementation, or
- Reversing the logic in all files to match the PR title
🧹 Nitpick comments (3)
harpy/sv.py (1)
85-85
: Consider refactoring duplicate SDM logicThe SDM logic is duplicated across functions:
sdm = "conda" if not container else "conda apptainer"Consider extracting this into a helper function to improve maintainability.
Also applies to: 184-184
harpy/align.py (1)
97-97
: Consider refactoring duplicate SDM logicThe SDM logic is duplicated across multiple functions:
sdm = "conda" if not container else "conda apptainer"Consider extracting this into a helper function to improve maintainability.
Also applies to: 188-188, 298-298
harpy/simulate.py (1)
157-157
: Consider refactoring duplicate SDM logicThe SDM logic is duplicated across multiple functions:
sdm = "conda" if not container else "conda apptainer"Consider extracting this into a helper function to improve maintainability.
Example refactor:
# Add to _misc.py or create a new module +def get_software_deployment_method(container: bool) -> str: + return "conda" if not container else "conda apptainer" # In each function, replace the duplicate logic with: -sdm = "conda" if not container else "conda apptainer" +sdm = get_software_deployment_method(container)Also applies to: 265-265, 381-381, 497-497, 603-603
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
.github/workflows/tests.yml
(17 hunks)harpy/align.py
(9 hunks)harpy/assembly.py
(3 hunks)harpy/deconvolve.py
(3 hunks)harpy/demultiplex.py
(4 hunks)harpy/impute.py
(3 hunks)harpy/metassembly.py
(3 hunks)harpy/phase.py
(4 hunks)harpy/preflight.py
(5 hunks)harpy/qc.py
(3 hunks)harpy/simulate.py
(15 hunks)harpy/snp.py
(6 hunks)harpy/sv.py
(6 hunks)
✅ Files skipped from review due to trivial changes (1)
- .github/workflows/tests.yml
⏰ Context from checks skipped due to timeout of 90000ms (17)
- GitHub Check: align strobe
- GitHub Check: simulate variants
- GitHub Check: simulate linkedreads
- GitHub Check: qc
- GitHub Check: preflight
- GitHub Check: phase
- GitHub Check: sv naibr
- GitHub Check: mpileup
- GitHub Check: sv leviathan
- GitHub Check: impute
- GitHub Check: harpy extras
- GitHub Check: freebayes
- GitHub Check: align EMA
- GitHub Check: demux gen1
- GitHub Check: metassembly
- GitHub Check: align BWA
- GitHub Check: deconvolve
🔇 Additional comments (4)
harpy/snp.py (1)
36-36
: LGTM: Command-line interface changes are consistentThe changes to replace
--conda
with--container
in the docstring and function signatures are consistent throughout the file.Also applies to: 46-46, 60-60, 66-66, 165-165, 172-172
harpy/sv.py (1)
40-40
: LGTM: Command-line interface changes are consistentThe changes to replace
--conda
with--container
in the docstring and function signatures are consistent throughout the file.Also applies to: 50-50, 64-64, 72-72, 157-157, 165-165
harpy/align.py (1)
42-42
: LGTM: Command-line interface changes are consistentThe changes to replace
--conda
with--container
in the docstring and function signatures are consistent throughout the file.Also applies to: 52-52, 62-62, 76-76, 84-84, 167-167, 173-173, 275-275, 282-282
harpy/simulate.py (1)
49-49
: LGTM: Command-line interface changes are consistentThe changes to replace
--conda
with--container
in the docstring and function signatures are consistent throughout the file.Also applies to: 67-67, 85-85, 103-103, 121-121, 138-138, 144-144, 232-232, 239-239, 358-358, 365-365, 468-468, 475-475, 581-581, 588-588
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
harpy/bin/concatenate_bam.py (1)
Line range hint
1-156
: Consider enhancing logging and validation.While the core functionality is solid, consider these improvements:
- Add logging to track the concatenation process, number of records processed, and MI/BX tag statistics
- Include input filenames in error messages for easier debugging
- Validate that input BAM files have compatible RG headers before concatenation
Would you like me to provide example implementations for these improvements?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
harpy/bin/concatenate_bam.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build/Cache Container
header['RG'][0]['ID'] = "concat" | ||
header['RG'][0]['SM'] = "concat" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reconsider using static read group identifiers.
Using static "concat" values for RG ID and SM fields could cause issues:
- Loss of sample traceability and origin information
- Potential conflicts if multiple concatenated BAM files need to be merged later
- Impact on downstream tools that rely on unique RG headers for sample identification
- Reduced ability to perform sample-specific analysis and quality control
Consider maintaining unique identifiers or implementing a more descriptive naming scheme that preserves sample identity information.
Example approach:
-header['RG'][0]['ID'] = "concat"
-header['RG'][0]['SM'] = "concat"
+# Generate unique identifier combining sample names
+sample_names = [Path(bam).stem for bam in aln_list]
+merged_id = f"merged_{'-'.join(sample_names)}"
+header['RG'][0]['ID'] = merged_id
+header['RG'][0]['SM'] = merged_id
Committable suggestion skipped: line range outside the PR's diff.
Summary by CodeRabbit
New Features
--conda
option with--container
across multiple Harpy modules.Improvements
Changes