feat: support for samples w/ and w/o primers #344

FelixMoelder · 2024-12-03T08:24:44Z

In some experiments, there are samples with and without primers. However, this case was not previously supported because the workflow always expects a panel containing primers to be provided in the samples.tsv file. To address this, the workflow now dynamically checks if a panel is provided for primer trimming. Nevertheless, primer trimming must still be activated in the configuration file.

We should consider whether the primer trimming section in the configuration file is still necessary, or if trimming can be inferred from the presence of a panel in the samplesheet. However, we should retain the primer section for setting a custom library length and specifying the paths to the primer files.

Summary by CodeRabbit

New Features
- Introduced a function to check for primer presence in sample data.
- Enhanced logic for handling sample data and file outputs.
- Added new requirements for sample and unit sheets in configuration.
- Enabled primer trimming and mutational burden estimation in configuration files.
- Activated both delly and freebayes for variant calling in multiple configurations.
Bug Fixes
- Improved error handling for sample and unit combinations.
Chores
- Removed environment specification from the merge_untrimmed_fastqs rule.
- Added execution order dependency between apply_bqsr and bam_index rules.
- Updated comments for clarity throughout configuration files.

coderabbitai · 2024-12-03T08:24:53Z

Walkthrough

The changes in this pull request involve significant modifications to the Snakemake workflow, particularly in the workflow/rules/common.smk and workflow/rules/mapping.smk files. Key updates include the introduction of new functions for handling sample data, restructuring existing functions for improved clarity, and refining error handling. Additionally, the conda directive was removed from the merge_untrimmed_fastqs rule, and a rule order was established between apply_bqsr and bam_index, ensuring proper execution sequencing.

Changes

File Path	Change Summary
workflow/rules/common.smk	- Added `sample_has_primers` function. - Updated `get_recalibrate_quality_input`, `get_trimming_input`, `get_aligner`, and `get_consensus_input` function signatures. - Enhanced error handling in `get_raw_reads` function. - Streamlined output file path generation in `get_final_output`.
workflow/rules/mapping.smk	- Removed `conda` directive from `merge_untrimmed_fastqs` rule. - Added `ruleorder: apply_bqsr > bam_index` to enforce execution order.
config/README.md	- Added new requirements for `panel`, `mutational_burden_events`, `umi_read`, and `umi_read_structure` columns in sample and unit sheets.
.test/config-chm-eval/config.yaml	- Removed `activate` key in `primers` section. - Activated `delly` and `freebayes` in `calling` section.
.test/config-giab/config.yaml	- Removed `activate` key in `primers` section; paths for primer files set to empty. - No other changes.
.test/config-no-candidate-filtering/config.yaml	- Removed `activate` key in `primers` section. - Added `with_aux` key in `filter` subsection. - Updated `final_calls` in `vep` subsection.
.test/config-simple/config.yaml	- Set `primers.trimming.activate` to `false`. - Activated `mutational_burden` and `report` sections.
.test/config-sra/config.yaml	- Removed `activate` key in `primers` section. - Activated `mutational_burden`.
.test/config-target-regions/config.yaml	- Changed `primers.trimming.activate` to `true`. - Activated `annotations.vcfs`.
.test/config-target-regions/config_multiple_beds.yaml	- Removed `activate` key in `primers` section. - Activated `mutational_burden`.
.test/config_primers/config.yaml	- Removed `activate` key in `primers.trimming`.
config/config.yaml	- Set `primers.trimming.activate` and `mutational_burden.activate` to `true`. - Updated paths for `primers_fa1` and `primers_fa2` to empty strings.

Possibly related PRs

fix: fix excluding events from the report #332: The changes in this PR involve substantial modifications to the workflow/rules/common.smk file, particularly in the get_final_output function, which is related to the restructuring of functions and enhancements to data handling logic in the main PR. Both PRs focus on improving output generation and managing sample data effectively.

Suggested reviewers

johanneskoester
dlaehnemann

Poem

In the workflow's dance, we refine and align,
Functions and rules, all crafted in time.
With samples and paths, we tidy the way,
A hop and a skip, new logic at play!
Errors now caught, like carrots in sight,
Our Snakemake's ready, all set for the flight! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)

workflow/rules/common.smk (1)

263-267: Ensure get_sample_datatype returns expected values

The get_aligner function defaults to 'bwa' when the sample datatype is not 'rna'. If get_sample_datatype can return unexpected or invalid datatypes, consider handling such cases explicitly or raising an error to prevent unintended behavior.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 185b2fe and 710b001.

📒 Files selected for processing (2)

workflow/rules/common.smk (4 hunks)
workflow/rules/mapping.smk (0 hunks)

💤 Files with no reviewable changes (1)

workflow/rules/mapping.smk

🔇 Additional comments (4)

workflow/rules/common.smk (4)

Line range hint 422-426: Function get_markduplicates_input logic is appropriate

The function correctly determines the input path for duplicate marking based on whether the sample has UMIs and the aligner used.

431-441: Logic in get_recalibrate_quality_input function is acceptable

The function appropriately selects the input file for quality recalibration based on the sample datatype and whether certain features are activated.

443-448: Function get_consensus_input correctly handles primer trimming

The function properly checks if primer trimming is activated and if the sample has primers, to determine the appropriate consensus input.

451-459: Function get_trimming_input logic is sound

The function correctly determines the trimming input based on whether duplicate removal is activated and uses the appropriate aligner.

workflow/rules/common.smk

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

config/README.md (1)
49-49: Consider enhancing the primer trimming documentation.

While the new behavior is clearly documented, consider adding these clarifications to help users:

Explicitly state that primer trimming must still be enabled in config.yaml even when panels are specified

Add a note about the impact on existing workflows (e.g., "Note: Existing workflows that rely on primer trimming should ensure their samples have panels specified")

Consider adding an example showing the relationship between panel specification and primer trimming behavior

Here's a suggested expansion of the documentation:
 If a panel is not provided for a sample, trimming will not be performed on that sample. 
+Note: Primer trimming must still be explicitly enabled in config.yaml, even when panels are specified.
+
+Example:
+| sample_name | panel      | Trimming Behavior                |
+|-------------|------------|----------------------------------|
+| sample1     | panel1.bed | Trimming performed if enabled    |
+| sample2     | NA         | No trimming, regardless of config|
+
+Note: If you're updating an existing workflow that relies on primer trimming,
+ensure that all relevant samples have panels specified in samples.tsv.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 710b001 and 8bccfa4.

📒 Files selected for processing (1)

config/README.md (1 hunks)

🔇 Additional comments (1)

config/README.md (1)

49-49: LGTM! Clear documentation of the new behavior.

The added line clearly explains that primer trimming will be skipped for samples without a panel, which aligns with the PR's objective of supporting samples with and without primers.

johanneskoester

Minor things, nice work! I vote for removing the activate mechanism for primer trimming in the config then.

workflow/rules/common.smk

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

workflow/rules/common.smk (1)
1376-1376: Remove extra whitespace.

There's unnecessary whitespace on this line.
-   
+

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8bccfa4 and 55f616c.

📒 Files selected for processing (10)

.test/config-chm-eval/config.yaml (0 hunks)
.test/config-giab/config.yaml (0 hunks)
.test/config-no-candidate-filtering/config.yaml (0 hunks)
.test/config-simple/config.yaml (0 hunks)
.test/config-sra/config.yaml (0 hunks)
.test/config-target-regions/config.yaml (0 hunks)
.test/config-target-regions/config_multiple_beds.yaml (0 hunks)
.test/config_primers/config.yaml (0 hunks)
config/config.yaml (0 hunks)
workflow/rules/common.smk (3 hunks)

💤 Files with no reviewable changes (9)

.test/config_primers/config.yaml
config/config.yaml
.test/config-giab/config.yaml
.test/config-target-regions/config.yaml
.test/config-target-regions/config_multiple_beds.yaml
.test/config-chm-eval/config.yaml
.test/config-sra/config.yaml
.test/config-no-candidate-filtering/config.yaml
.test/config-simple/config.yaml

🔇 Additional comments (4)

workflow/rules/common.smk (4)

263-267: LGTM: Clear and correct aligner selection logic.

The function correctly determines the appropriate aligner based on the sample datatype, following standard bioinformatics practices.

417-422: LGTM: Proper handling of UMI and non-UMI cases.

The function correctly determines the input BAM file path based on UMI presence and aligner type.

424-452: LGTM: Well-structured input determination chain.

The chain of functions properly handles the determination of input files based on various conditions (RNA/DNA, primers, duplicates, etc.). The implementation is clean and follows a clear responsibility chain.

1371-1390: LGTM: Comprehensive primer detection logic.

The function correctly implements the PR's objective of supporting samples with and without primers by:

Checking global primer configuration
Checking sample-specific panel configuration
Validating paired-end requirement

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)

config/config.yaml (1)

33-39: LGTM! Consider enhancing the documentation.

The primer configuration changes look good. The empty default values for primers_fa1 and primers_fa2 align well with the PR's objective to support samples with and without primers.

Consider adding a comment explaining that these paths can be:

Left empty for samples without primers

Set globally for all samples

Overridden per sample via the optional primer TSV file
workflow/rules/common.smk (2)
424-452: LGTM! Consider adding error handling for invalid file extensions.

The chain of functions is well-structured and correctly handles the various input scenarios. The addition of the bai parameter improves flexibility.

Consider adding validation for the ext parameter in get_trimming_input:
 def get_trimming_input(wildcards, bai=False):
     ext = "bai" if bai else "bam"
+    if ext not in ["bam", "bai"]:
+        raise ValueError(f"Invalid extension: {ext}. Must be 'bam' or 'bai'.")
     if is_activated("remove_duplicates"):
         return "results/dedup/{{sample}}.{ext}".format(ext=ext)
     else:
         aligner = get_aligner(wildcards)
         return "results/mapped/{aligner}/{{sample}}.{ext}".format(
             aligner=aligner, ext=ext
         )
424-452: Consider using f-strings for string formatting.

For better readability and maintainability, consider using f-strings instead of .format().

Example changes:
-    return "results/dedup/{{sample}}.{ext}".format(ext=ext)
+    return f"results/dedup/{{sample}}.{ext}"

-    return "results/mapped/{aligner}/{{sample}}.{ext}".format(
-        aligner=aligner, ext=ext
-    )
+    return f"results/mapped/{aligner}/{{sample}}.{ext}"
Also applies to: 1371-1384

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 55f616c and 893e0e1.

📒 Files selected for processing (3)

.test/config-giab/config.yaml (1 hunks)
config/config.yaml (1 hunks)
workflow/rules/common.smk (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

.test/config-giab/config.yaml

🔇 Additional comments (3)

workflow/rules/common.smk (3)

263-267: LGTM! Good extraction of aligner selection logic.

Clean implementation that improves maintainability by centralizing the aligner selection logic.

417-422: LGTM! Good use of the new aligner selection function.

The function correctly uses the new get_aligner function while maintaining the existing UMI annotation logic.

1371-1384: ⚠️ Potential issue

Add safety check for 'panel' column access.

The function implementation looks good overall, but there's a potential issue with accessing the 'panel' column.

Apply this diff to handle the missing 'panel' column:

 def sample_has_primers(wildcards):
     sample_name = wildcards.sample
 
     if config["primers"]["trimming"].get("primers_fa1") or (
-        "panel" in samples.columns
-        and samples.loc[samples["sample_name"] == sample_name, "panel"].notna().any()
+        samples.get("panel") is not None
+        and samples.loc[samples["sample_name"] == sample_name, "panel"].notna().any()
     ):
         if not is_paired_end(sample_name):
             raise WorkflowError(
                 f"Primer trimming is only available for paired-end data. Sample '{sample_name}' is not paired-end."
             )
         return True
     return False

FelixMoelder · 2024-12-03T11:18:41Z

Just updated the workflow to not rely on primer trimming activation in the config anymore.
The updated config now does not allow placeholder strings for primers_fa1/fa2 anymore as those would be interpreted as global primers without explicit activation. This is a breaking change which is why we should treat this PR as a major release.

feat: support sample w/ and w/o primers

710b001

FelixMoelder requested a review from johanneskoester December 3, 2024 08:24

coderabbitai bot reviewed Dec 3, 2024

View reviewed changes

workflow/rules/common.smk Outdated Show resolved Hide resolved

update readme

8bccfa4

coderabbitai bot reviewed Dec 3, 2024

View reviewed changes

johanneskoester requested changes Dec 3, 2024

View reviewed changes

workflow/rules/common.smk Outdated Show resolved Hide resolved

derive primer trimming by samplesheet

55f616c

coderabbitai bot reviewed Dec 3, 2024

View reviewed changes

FelixMoelder added 3 commits December 3, 2024 10:23

fmt

8c51bf1

minor improvements

338b01e

update test

893e0e1

coderabbitai bot reviewed Dec 3, 2024

View reviewed changes

johanneskoester approved these changes Dec 3, 2024

View reviewed changes

FelixMoelder merged commit 7c24fb8 into master Dec 3, 2024
9 checks passed

FelixMoelder deleted the feat/support_primerless_samples branch December 3, 2024 12:38

github-actions bot mentioned this pull request Dec 3, 2024

chore(master): release 5.11.0 #335

Open

coderabbitai bot mentioned this pull request Dec 3, 2024

feat: add option to allow mapping to pangenome #274

Merged

coderabbitai bot mentioned this pull request Dec 12, 2024

feat: allow setting of custom alignment properties #346

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support for samples w/ and w/o primers #344

feat: support for samples w/ and w/o primers #344

FelixMoelder commented Dec 3, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 3, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

johanneskoester left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

FelixMoelder commented Dec 3, 2024 •

edited

Loading

feat: support for samples w/ and w/o primers #344

feat: support for samples w/ and w/o primers #344

Conversation

FelixMoelder commented Dec 3, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Dec 3, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

johanneskoester left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

FelixMoelder commented Dec 3, 2024 • edited Loading

FelixMoelder commented Dec 3, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 3, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

FelixMoelder commented Dec 3, 2024 •

edited

Loading