-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support for samples w/ and w/o primers #344
Conversation
WalkthroughThe changes in this pull request involve significant modifications to the Snakemake workflow, particularly in the Changes
Possibly related PRs
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
workflow/rules/common.smk (1)
263-267
: Ensureget_sample_datatype
returns expected valuesThe
get_aligner
function defaults to'bwa'
when the sample datatype is not'rna'
. Ifget_sample_datatype
can return unexpected or invalid datatypes, consider handling such cases explicitly or raising an error to prevent unintended behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
workflow/rules/common.smk
(4 hunks)workflow/rules/mapping.smk
(0 hunks)
💤 Files with no reviewable changes (1)
- workflow/rules/mapping.smk
🔇 Additional comments (4)
workflow/rules/common.smk (4)
Line range hint 422-426
: Function get_markduplicates_input
logic is appropriate
The function correctly determines the input path for duplicate marking based on whether the sample has UMIs and the aligner used.
431-441
: Logic in get_recalibrate_quality_input
function is acceptable
The function appropriately selects the input file for quality recalibration based on the sample datatype and whether certain features are activated.
443-448
: Function get_consensus_input
correctly handles primer trimming
The function properly checks if primer trimming is activated and if the sample has primers, to determine the appropriate consensus input.
451-459
: Function get_trimming_input
logic is sound
The function correctly determines the trimming input based on whether duplicate removal is activated and uses the appropriate aligner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
config/README.md (1)
49-49
: Consider enhancing the primer trimming documentation.While the new behavior is clearly documented, consider adding these clarifications to help users:
- Explicitly state that primer trimming must still be enabled in
config.yaml
even when panels are specified- Add a note about the impact on existing workflows (e.g., "Note: Existing workflows that rely on primer trimming should ensure their samples have panels specified")
- Consider adding an example showing the relationship between panel specification and primer trimming behavior
Here's a suggested expansion of the documentation:
If a panel is not provided for a sample, trimming will not be performed on that sample. +Note: Primer trimming must still be explicitly enabled in config.yaml, even when panels are specified. + +Example: +| sample_name | panel | Trimming Behavior | +|-------------|------------|----------------------------------| +| sample1 | panel1.bed | Trimming performed if enabled | +| sample2 | NA | No trimming, regardless of config| + +Note: If you're updating an existing workflow that relies on primer trimming, +ensure that all relevant samples have panels specified in samples.tsv.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
config/README.md
(1 hunks)
🔇 Additional comments (1)
config/README.md (1)
49-49
: LGTM! Clear documentation of the new behavior.
The added line clearly explains that primer trimming will be skipped for samples without a panel, which aligns with the PR's objective of supporting samples with and without primers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor things, nice work! I vote for removing the activate mechanism for primer trimming in the config then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
workflow/rules/common.smk (1)
1376-1376
: Remove extra whitespace.There's unnecessary whitespace on this line.
- +
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (10)
.test/config-chm-eval/config.yaml
(0 hunks).test/config-giab/config.yaml
(0 hunks).test/config-no-candidate-filtering/config.yaml
(0 hunks).test/config-simple/config.yaml
(0 hunks).test/config-sra/config.yaml
(0 hunks).test/config-target-regions/config.yaml
(0 hunks).test/config-target-regions/config_multiple_beds.yaml
(0 hunks).test/config_primers/config.yaml
(0 hunks)config/config.yaml
(0 hunks)workflow/rules/common.smk
(3 hunks)
💤 Files with no reviewable changes (9)
- .test/config_primers/config.yaml
- config/config.yaml
- .test/config-giab/config.yaml
- .test/config-target-regions/config.yaml
- .test/config-target-regions/config_multiple_beds.yaml
- .test/config-chm-eval/config.yaml
- .test/config-sra/config.yaml
- .test/config-no-candidate-filtering/config.yaml
- .test/config-simple/config.yaml
🔇 Additional comments (4)
workflow/rules/common.smk (4)
263-267
: LGTM: Clear and correct aligner selection logic.
The function correctly determines the appropriate aligner based on the sample datatype, following standard bioinformatics practices.
417-422
: LGTM: Proper handling of UMI and non-UMI cases.
The function correctly determines the input BAM file path based on UMI presence and aligner type.
424-452
: LGTM: Well-structured input determination chain.
The chain of functions properly handles the determination of input files based on various conditions (RNA/DNA, primers, duplicates, etc.). The implementation is clean and follows a clear responsibility chain.
1371-1390
: LGTM: Comprehensive primer detection logic.
The function correctly implements the PR's objective of supporting samples with and without primers by:
- Checking global primer configuration
- Checking sample-specific panel configuration
- Validating paired-end requirement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
config/config.yaml (1)
33-39
: LGTM! Consider enhancing the documentation.The primer configuration changes look good. The empty default values for
primers_fa1
andprimers_fa2
align well with the PR's objective to support samples with and without primers.Consider adding a comment explaining that these paths can be:
- Left empty for samples without primers
- Set globally for all samples
- Overridden per sample via the optional primer TSV file
workflow/rules/common.smk (2)
424-452
: LGTM! Consider adding error handling for invalid file extensions.The chain of functions is well-structured and correctly handles the various input scenarios. The addition of the
bai
parameter improves flexibility.Consider adding validation for the
ext
parameter inget_trimming_input
:def get_trimming_input(wildcards, bai=False): ext = "bai" if bai else "bam" + if ext not in ["bam", "bai"]: + raise ValueError(f"Invalid extension: {ext}. Must be 'bam' or 'bai'.") if is_activated("remove_duplicates"): return "results/dedup/{{sample}}.{ext}".format(ext=ext) else: aligner = get_aligner(wildcards) return "results/mapped/{aligner}/{{sample}}.{ext}".format( aligner=aligner, ext=ext )
424-452
: Consider using f-strings for string formatting.For better readability and maintainability, consider using f-strings instead of .format().
Example changes:
- return "results/dedup/{{sample}}.{ext}".format(ext=ext) + return f"results/dedup/{{sample}}.{ext}" - return "results/mapped/{aligner}/{{sample}}.{ext}".format( - aligner=aligner, ext=ext - ) + return f"results/mapped/{aligner}/{{sample}}.{ext}"Also applies to: 1371-1384
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
.test/config-giab/config.yaml
(1 hunks)config/config.yaml
(1 hunks)workflow/rules/common.smk
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .test/config-giab/config.yaml
🔇 Additional comments (3)
workflow/rules/common.smk (3)
263-267
: LGTM! Good extraction of aligner selection logic.
Clean implementation that improves maintainability by centralizing the aligner selection logic.
417-422
: LGTM! Good use of the new aligner selection function.
The function correctly uses the new get_aligner
function while maintaining the existing UMI annotation logic.
1371-1384
:
Add safety check for 'panel' column access.
The function implementation looks good overall, but there's a potential issue with accessing the 'panel' column.
Apply this diff to handle the missing 'panel' column:
def sample_has_primers(wildcards):
sample_name = wildcards.sample
if config["primers"]["trimming"].get("primers_fa1") or (
- "panel" in samples.columns
- and samples.loc[samples["sample_name"] == sample_name, "panel"].notna().any()
+ samples.get("panel") is not None
+ and samples.loc[samples["sample_name"] == sample_name, "panel"].notna().any()
):
if not is_paired_end(sample_name):
raise WorkflowError(
f"Primer trimming is only available for paired-end data. Sample '{sample_name}' is not paired-end."
)
return True
return False
Just updated the workflow to not rely on primer trimming activation in the config anymore. |
In some experiments, there are samples with and without primers. However, this case was not previously supported because the workflow always expects a panel containing primers to be provided in the
samples.tsv
file. To address this, the workflow now dynamically checks if a panel is provided for primer trimming. Nevertheless, primer trimming must still be activated in the configuration file.We should consider whether the primer trimming section in the configuration file is still necessary, or if trimming can be inferred from the presence of a panel in the samplesheet. However, we should retain the primer section for setting a custom library length and specifying the paths to the primer files.
Summary by CodeRabbit
New Features
delly
andfreebayes
for variant calling in multiple configurations.Bug Fixes
Chores
merge_untrimmed_fastqs
rule.apply_bqsr
andbam_index
rules.