Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow setting of custom alignment properties #346

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

FelixMoelder
Copy link
Contributor

@FelixMoelder FelixMoelder commented Dec 12, 2024

This PR introduces the definition of custom alignment properties estimated by varlociraptor.
In case custom alignment properties are enabled in the config file property names can be assigned to samples in the samplesheet via a user-defined column. The corresponding alignment properties will be looked up from a tsv sheet containing property names and paths to json-files. In case some sample does not contain a property name (cell entry is NaN) the alignment properties will be estimate by varlociraptor for that sample.

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced a new section for custom alignment properties in multiple configuration files, allowing for enhanced configurability.
    • Added functions for dynamic retrieval of alignment properties based on configuration settings.
  • Bug Fixes

    • Adjusted input handling for alignment properties in the workflow rules to improve flexibility.
  • Documentation

    • Updated JSON schema to include new custom alignment properties, ensuring clarity in configuration requirements.

Copy link

coderabbitai bot commented Dec 12, 2024

Warning

Rate limit exceeded

@FelixMoelder has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 0 minutes and 53 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between df4d185 and 1f1939f.

📒 Files selected for processing (1)
  • config/config.yaml (1 hunks)

Walkthrough

The pull request introduces a new section called custom_alignment_properties across multiple configuration files. This section includes three parameters: activate, column, and tsv, all of which are designed to manage alignment properties. The activate parameter is set to false, indicating that the feature is currently disabled. Additionally, changes are made to the workflow files to dynamically retrieve alignment properties based on the new configuration, enhancing the flexibility of the alignment process.

Changes

File Path Change Summary
.test/config-chm-eval/config.yaml Added custom_alignment_properties: activate: false, column: "panel", tsv: ""
.test/config-giab/config.yaml Added custom_alignment_properties: activate: false, column: "panel", tsv: ""
.test/config-no-candidate-filtering/config.yaml Added custom_alignment_properties: activate: false, column: "panel", tsv: ""
.test/config-simple/config.yaml Added custom_alignment_properties: activate: false, column: "panel", tsv: ""
.test/config-sra/config.yaml Added custom_alignment_properties: activate: false, column: "panel", tsv: ""
.test/config-target-regions/config.yaml Added custom_alignment_properties: activate: false, column: "panel", tsv: ""
.test/config-target-regions/config_multiple_beds.yaml Added custom_alignment_properties: activate: false, column: "panel", tsv: ""
.test/config_primers/config.yaml Added custom_alignment_properties: activate: false, column: "panel", tsv: ""
config/config.yaml Added custom_alignment_properties: activate: false, column: "panel", tsv: "config/alignment_properties.tsv"
workflow/rules/calling.smk Updated alignment_props input to use get_alignment_props() instead of a static path.
workflow/rules/common.smk Added functions is_activated(xpath) and get_alignment_props(), and variable custom_alignment_props.
workflow/schemas/config.schema.yaml Added custom_alignment_properties object with activate, column, and tsv to the schema. Updated required properties.

Possibly related PRs

  • feat: add option to allow mapping to pangenome #274: The main PR and this PR both introduce a new section titled custom_alignment_properties in configuration files, with identical parameters (activate, column, and tsv), indicating a direct relationship in terms of configuration structure and purpose.

Suggested reviewers

  • johanneskoester

🐇 In the fields where bunnies play,
New configs hop in, brightening the day!
With properties set, though not yet awake,
A path to alignments, we joyfully make.
So let’s dance and cheer, for changes are here!
🥕✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (7)
.test/config-sra/config.yaml (2)

117-120: LGTM! Consider adding documentation for the new feature.

The new custom_alignment_properties section is well-structured with sensible defaults. However, documentation about its purpose and usage would be helpful.

Consider adding comments explaining:

  • The purpose of this feature
  • The expected format of the TSV file
  • The significance of the "panel" column
🧰 Tools
🪛 yamllint (1.35.1)

[error] 120-120: no new line character at the end of file

(new-line-at-end-of-file)


115-120: Fix formatting issues.

There are two formatting issues to address:

  • Remove trailing spaces on line 116
  • Add a newline at the end of the file
  generate_excel: true
- 
+ 
custom_alignment_properties:
  activate: false
  column: "panel"
  tsv: "config_alignment_properties.tsv"
+
🧰 Tools
🪛 yamllint (1.35.1)

[error] 116-116: trailing spaces

(trailing-spaces)


[error] 120-120: no new line character at the end of file

(new-line-at-end-of-file)

.test/config_primers/config.yaml (1)

120-123: LGTM! Configuration is consistent.

The custom_alignment_properties section matches the structure seen in other configuration files, maintaining consistency across the codebase.

Add a newline at the end of the file:

  activate: false
  column: "panel"
  tsv: "config_alignment_properties.tsv"
+
🧰 Tools
🪛 yamllint (1.35.1)

[error] 123-123: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-giab/config.yaml (1)

161-164: Configuration structure looks good, fix formatting

The custom_alignment_properties configuration is well-structured with sensible defaults. The activate: false ensures backward compatibility.

Add a newline at the end of the file to fix the YAML formatting:

   column: "panel"
   tsv: "config_alignment_properties.tsv"
+
🧰 Tools
🪛 yamllint (1.35.1)

[error] 164-164: no new line character at the end of file

(new-line-at-end-of-file)

workflow/schemas/config.schema.yaml (1)

350-359: Schema structure looks good, consider validation improvements

The schema correctly defines the custom_alignment_properties structure. Consider adding:

  1. Pattern validation for the tsv path
  2. Enum or pattern validation for the column field if there are expected values

Add validation patterns to the schema:

       tsv:
         type: string
+        pattern: "^[^/].*\\.tsv$"  # Must end with .tsv and not start with /
       column:
         type: string
+        pattern: "^[a-zA-Z_][a-zA-Z0-9_]*$"  # Valid column name pattern
config/config.yaml (1)

293-293: Add newline at end of file.

Add a newline character at the end of the file to follow YAML best practices.

 custom_alignment_properties:
   activate: false
   column: "panel"
-  tsv: "config_alignment_properties.tsv"
+  tsv: "config_alignment_properties.tsv"
+
🧰 Tools
🪛 yamllint (1.35.1)

[error] 293-293: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-chm-eval/config.yaml (1)

191-191: Add newline at end of file.

Add a newline character at the end of the file to follow YAML best practices.

 custom_alignment_properties:
   activate: false
   column: "panel"
-  tsv: "config_alignment_properties.tsv"
+  tsv: "config_alignment_properties.tsv"
+
🧰 Tools
🪛 yamllint (1.35.1)

[error] 191-191: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d8f55ea and 4cb3720.

📒 Files selected for processing (12)
  • .test/config-chm-eval/config.yaml (1 hunks)
  • .test/config-giab/config.yaml (1 hunks)
  • .test/config-no-candidate-filtering/config.yaml (1 hunks)
  • .test/config-simple/config.yaml (1 hunks)
  • .test/config-sra/config.yaml (1 hunks)
  • .test/config-target-regions/config.yaml (1 hunks)
  • .test/config-target-regions/config_multiple_beds.yaml (1 hunks)
  • .test/config_primers/config.yaml (1 hunks)
  • config/config.yaml (1 hunks)
  • workflow/rules/calling.smk (1 hunks)
  • workflow/rules/common.smk (2 hunks)
  • workflow/schemas/config.schema.yaml (2 hunks)
🧰 Additional context used
🪛 yamllint (1.35.1)
.test/config-simple/config.yaml

[error] 134-134: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-target-regions/config.yaml

[error] 133-133: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-no-candidate-filtering/config.yaml

[error] 130-130: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-giab/config.yaml

[error] 164-164: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-target-regions/config_multiple_beds.yaml

[error] 131-131: no new line character at the end of file

(new-line-at-end-of-file)

config/config.yaml

[error] 293-293: no new line character at the end of file

(new-line-at-end-of-file)

.test/config_primers/config.yaml

[error] 123-123: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-sra/config.yaml

[error] 116-116: trailing spaces

(trailing-spaces)


[error] 120-120: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-chm-eval/config.yaml

[error] 191-191: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (8)
.test/config-target-regions/config.yaml (1)

130-133: LGTM! Verify interaction with target regions.

The configuration is consistent with other files. Since this is a target regions configuration, ensure that the custom alignment properties work correctly with the target regions functionality.

Let's verify the interaction between custom alignment properties and target regions:

Add a newline at the end of the file:

  activate: false
  column: "panel"
  tsv: "config_alignment_properties.tsv"
+
✅ Verification successful

Custom alignment properties are properly integrated with target regions

Based on the codebase analysis, the custom alignment properties feature is well-integrated into the variant calling pipeline and works independently from target regions functionality:

  • The alignment properties are used by varlociraptor for variant calling through get_alignment_props() function
  • Target regions are handled separately in the get_target_regions rule
  • The two features serve different purposes and don't interfere with each other:
    • Target regions define areas of interest for variant calling
    • Custom alignment properties provide sample-specific alignment characteristics for variant calling
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential interactions between custom alignment properties and target regions

# Look for references to target regions in alignment-related code
rg -l 'target.*region.*align|align.*target.*region' --type py

# Check if the alignment properties TSV contains any target region related columns
if [ -f "config_alignment_properties.tsv" ]; then
    head -n 1 config_alignment_properties.tsv
fi

Length of output: 108


Script:

#!/bin/bash
# Let's try a broader search to understand the relationship between target regions and alignment

# Search for files containing both "target region" and "alignment" in any context
rg -l "target.*region" --type py | xargs rg -l "align"

# Look for alignment-related configuration or utility files
fd "align" -t f

# Check for any target region configuration files
fd "target.*region" -t f

# Look for alignment-related functions or classes
ast-grep --pattern 'class $_ {
  $$$
  align($$$) {
    $$$
  }
  $$$
}'

ast-grep --pattern 'def align($$$) {
  $$$
}'

Length of output: 388


Script:

#!/bin/bash
# Let's examine the content of files that contain both target regions and alignment

# Check workflow/rules/common.smk for alignment and target region interactions
rg "align|target.*region" -C 5 workflow/rules/common.smk

# Check workflow/rules/mapping.smk for alignment and target region interactions
rg "align|target.*region" -C 5 workflow/rules/mapping.smk

# Look for any configuration files that might define how alignment and target regions interact
fd "config" -t f -e yaml -e yml | xargs rg "align|target.*region" -C 2

Length of output: 13107

🧰 Tools
🪛 yamllint (1.35.1)

[error] 133-133: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-target-regions/config_multiple_beds.yaml (1)

128-131: Add documentation for the new configuration section

Please add comments explaining:

  • The purpose of the custom_alignment_properties feature
  • The expected format and content of the TSV file
  • The significance of the panel column

Verify the TSV file existence and format

The configuration references an external file config_alignment_properties.tsv. Please ensure:

  1. The file exists in the expected location
  2. The file contains the required panel column

Add newline at end of file

Add a newline character at the end of the file to comply with YAML best practices.

   tsv: "config_alignment_properties.tsv"
+
🧰 Tools
🪛 yamllint (1.35.1)

[error] 131-131: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-no-candidate-filtering/config.yaml (1)

127-130: Configuration is consistent with other files

The custom_alignment_properties section matches the structure in other configuration files. Please apply the same documentation and validation improvements mentioned in the previous file's review.

🧰 Tools
🪛 yamllint (1.35.1)

[error] 130-130: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-simple/config.yaml (1)

131-134: Configuration is consistent with other files

The custom_alignment_properties section maintains consistency across all configuration files.

Consider documenting feature in workflow documentation

Since this appears to be a global feature being added across multiple configuration files, consider:

  1. Adding documentation in the workflow's README or documentation
  2. Including example TSV files in the test configurations
  3. Documenting any migration steps for existing workflows
🧰 Tools
🪛 yamllint (1.35.1)

[error] 134-134: no new line character at the end of file

(new-line-at-end-of-file)

workflow/rules/calling.smk (1)

50-50: Verify get_alignment_props() implementation and backward compatibility

The change from static path to dynamic function makes alignment properties more configurable, but we should ensure:

  1. The get_alignment_props() function exists and returns the correct path format
  2. Existing workflows aren't broken by this change
✅ Verification successful

Function implementation is correct and maintains backward compatibility

The get_alignment_props() function is properly implemented with:

  • Default behavior returning the same path format as before: results/alignment-properties/{group}/{sample}.json
  • Optional custom alignment properties support through configuration
  • Proper error handling for duplicate sample specifications
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify get_alignment_props function implementation and usage

# Check if get_alignment_props is defined in common.smk
echo "Checking for get_alignment_props definition:"
rg -A 5 "def get_alignment_props" workflow/rules/common.smk

# Check return format matches expected path pattern
echo "Checking alignment properties path pattern usage:"
rg -A 2 "results/alignment-properties/\{group\}/\{sample\}\.json"

Length of output: 1046


Script:

#!/bin/bash
# Let's verify the complete implementation and configuration handling

# Check the complete get_alignment_props function implementation
echo "Complete get_alignment_props implementation:"
rg -B 2 -A 15 "def get_alignment_props" workflow/rules/common.smk

# Check for custom_alignment_properties configuration usage
echo -e "\nCustom alignment properties configuration:"
rg "custom_alignment_properties" workflow/rules/common.smk

# Check for extract_unique_sample_column_value function
echo -e "\nRelated extract_unique_sample_column_value function:"
rg -A 5 "def extract_unique_sample_column_value" workflow/rules/common.smk

Length of output: 1631

config/config.yaml (1)

285-293: LGTM! Well-documented configuration section.

The new custom_alignment_properties section is clearly documented and follows the existing configuration structure. The comments effectively explain the purpose and usage of each parameter.

🧰 Tools
🪛 yamllint (1.35.1)

[error] 293-293: no new line character at the end of file

(new-line-at-end-of-file)

workflow/rules/common.smk (1)

114-127: LGTM! Well-structured DataFrame initialization.

The initialization of custom_alignment_props follows the established pattern in the codebase and properly handles the optional configuration case.

.test/config-chm-eval/config.yaml (1)

188-191: LGTM! Consistent configuration structure.

The custom_alignment_properties section maintains consistency with the main configuration file.

🧰 Tools
🪛 yamllint (1.35.1)

[error] 191-191: no new line character at the end of file

(new-line-at-end-of-file)

workflow/schemas/config.schema.yaml Show resolved Hide resolved
workflow/rules/common.smk Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (10)
.test/config-sra/config.yaml (1)

116-116: Fix YAML formatting issues

There are two formatting issues:

  1. Trailing spaces on line 116
  2. Missing newline at end of file

Apply this diff to fix the formatting:

-  generate_excel: true  
+  generate_excel: true
   
 custom_alignment_properties:
   activate: false
   column: "panel"
-  tsv: ""
\ No newline at end of file
+  tsv: ""
+

Also applies to: 120-120

🧰 Tools
🪛 yamllint (1.35.1)

[error] 116-116: trailing spaces

(trailing-spaces)

.test/config_primers/config.yaml (1)

123-123: Fix missing newline at end of file

Add a newline character at the end of the file.

Apply this diff:

-  tsv: ""
\ No newline at end of file
+  tsv: ""
+
🧰 Tools
🪛 yamllint (1.35.1)

[error] 123-123: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-target-regions/config.yaml (2)

133-133: Fix missing newline at end of file

Add a newline character at the end of the file.

Apply this diff:

-  tsv: ""
\ No newline at end of file
+  tsv: ""
+
🧰 Tools
🪛 yamllint (1.35.1)

[error] 133-133: no new line character at the end of file

(new-line-at-end-of-file)


130-133: Consider adding test cases for the new feature

Since this is a test configuration file, consider adding test cases that:

  1. Test the feature when activated with valid TSV
  2. Test error handling with invalid TSV path
  3. Test error handling with missing column
  4. Test with empty TSV file

Would you like help creating these test cases?

🧰 Tools
🪛 yamllint (1.35.1)

[error] 133-133: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-target-regions/config_multiple_beds.yaml (2)

128-131: Add documentation for the new custom_alignment_properties section

Please add comments explaining:

  • The purpose of this new feature
  • The meaning and expected values for each parameter
  • Example usage scenarios

Example documentation format:

# Custom alignment properties configuration
# Allows specifying per-sample alignment properties via a TSV file
custom_alignment_properties:
  # Whether to use custom alignment properties (default: false)
  activate: false
  # Column name in the TSV file that identifies the sample
  column: "panel"
  # Path to TSV file containing alignment properties
  tsv: ""
🧰 Tools
🪛 yamllint (1.35.1)

[error] 131-131: no new line character at the end of file

(new-line-at-end-of-file)


131-131: Add newline at end of file

Add a newline character at the end of the file to comply with YAML best practices.

🧰 Tools
🪛 yamllint (1.35.1)

[error] 131-131: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-no-candidate-filtering/config.yaml (1)

127-130: Consider documenting the test configuration template

Since this configuration appears in multiple test files with identical structure and values, consider:

  1. Adding a comment explaining this is a test configuration
  2. Creating a base template file that can be extended by test-specific configs

Example approach:

# Base template in .test/templates/base_config.yaml
custom_alignment_properties: &default_alignment_props
  activate: false
  column: "panel"
  tsv: ""

# In test-specific configs
custom_alignment_properties:
  <<: *default_alignment_props  # Inherit from base template
  # Override specific values if needed
🧰 Tools
🪛 yamllint (1.35.1)

[error] 130-130: no new line character at the end of file

(new-line-at-end-of-file)

config/config.yaml (2)

286-289: Documentation enhancement needed for TSV format

While the documentation explains the purpose well, it would be helpful to include:

  • The expected column headers in the TSV file
  • An example of the TSV content
  • The format of the alignment properties JSON files

Would you like me to help draft the enhanced documentation?


290-293: Add newline at end of file

The YAML file is missing a newline at the end, which is a POSIX requirement.

 custom_alignment_properties:
   activate: false
   column: "panel"
-  tsv: "config/alignment_properties.tsv"
+  tsv: "config/alignment_properties.tsv"
+
🧰 Tools
🪛 yamllint (1.35.1)

[error] 293-293: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-chm-eval/config.yaml (1)

187-191: Add newline at end of file

The YAML file is missing a newline at the end.

 custom_alignment_properties:
   activate: false
   column: "panel"
-  tsv: ""
+  tsv: ""
+
🧰 Tools
🪛 yamllint (1.35.1)

[error] 191-191: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4cb3720 and 765d099.

⛔ Files ignored due to path filters (1)
  • config/alignment_properties.tsv is excluded by !**/*.tsv
📒 Files selected for processing (11)
  • .test/config-chm-eval/config.yaml (1 hunks)
  • .test/config-giab/config.yaml (1 hunks)
  • .test/config-no-candidate-filtering/config.yaml (1 hunks)
  • .test/config-simple/config.yaml (1 hunks)
  • .test/config-sra/config.yaml (1 hunks)
  • .test/config-target-regions/config.yaml (1 hunks)
  • .test/config-target-regions/config_multiple_beds.yaml (1 hunks)
  • .test/config_primers/config.yaml (1 hunks)
  • config/config.yaml (1 hunks)
  • workflow/rules/calling.smk (1 hunks)
  • workflow/rules/common.smk (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • workflow/rules/calling.smk
🧰 Additional context used
🪛 yamllint (1.35.1)
.test/config-chm-eval/config.yaml

[error] 191-191: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-giab/config.yaml

[error] 164-164: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-no-candidate-filtering/config.yaml

[error] 130-130: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-simple/config.yaml

[error] 134-134: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-sra/config.yaml

[error] 116-116: trailing spaces

(trailing-spaces)


[error] 120-120: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-target-regions/config.yaml

[error] 133-133: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-target-regions/config_multiple_beds.yaml

[error] 131-131: no new line character at the end of file

(new-line-at-end-of-file)

.test/config_primers/config.yaml

[error] 123-123: no new line character at the end of file

(new-line-at-end-of-file)

config/config.yaml

[error] 293-293: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (8)
.test/config-sra/config.yaml (1)

117-120: Documentation needed for new configuration section

The new custom_alignment_properties section introduces functionality for custom alignment properties, but lacks documentation explaining:

  • Purpose and use cases for this feature
  • Required format of the TSV file
  • Expected columns in the TSV file
  • Impact of the panel column setting

Would you like me to help create a documentation template for this feature?

🧰 Tools
🪛 yamllint (1.35.1)

[error] 120-120: no new line character at the end of file

(new-line-at-end-of-file)

.test/config_primers/config.yaml (1)

120-123: Documentation and validation needed

Same concerns about documentation and validation as in config-sra/config.yaml.

🧰 Tools
🪛 yamllint (1.35.1)

[error] 123-123: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-target-regions/config.yaml (1)

130-133: Documentation and validation needed

Same concerns about documentation and validation as in config-sra/config.yaml.

🧰 Tools
🪛 yamllint (1.35.1)

[error] 133-133: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-simple/config.yaml (1)

131-134: Verify implementation details for custom alignment properties

The configuration structure is consistent across files, but we should verify:

  1. The expected format of the TSV file
  2. How these properties are validated and used
✅ Verification successful

Let's gather more information about how these properties are used in the workflow file.


Custom alignment properties implementation is properly validated and used

The implementation shows proper handling of the custom alignment properties:

  • TSV file is read with proper validation (dtype checking, tab separator)
  • Required columns "name" and "path" are enforced
  • Fallback path is provided when custom properties are not activated
  • Column value extraction is handled safely with null checks

The configuration structure aligns well with its implementation:

  • activate: Controls the feature via is_activated()
  • column: Used to extract sample column value
  • tsv: Path to TSV file with proper format (name, path columns)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usage of custom alignment properties
echo "=== Searching for TSV parsing and validation ==="
rg -l "custom_alignment_properties"

echo "=== Checking for schema validation ==="
fd "config.schema.yaml" | xargs cat

echo "=== Looking for related test files ==="
fd -e tsv "alignment_properties"

Length of output: 411


Script:

#!/bin/bash
# Check the implementation in common.smk
echo "=== Examining custom alignment properties usage ==="
rg -A 10 "custom_alignment_properties" workflow/rules/common.smk

# Look for any TSV-related code
echo "=== Searching for TSV handling ==="
rg -l "\.tsv"

# Check for any alignment-related functions
echo "=== Looking for alignment-related code ==="
rg -A 5 "alignment" workflow/rules/common.smk

Length of output: 2679

🧰 Tools
🪛 yamllint (1.35.1)

[error] 134-134: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-chm-eval/config.yaml (1)

188-191: Verify if empty TSV path is intended for tests

The empty TSV path might cause issues if the feature is activated. Please verify if this is intentional for testing purposes.

✅ Verification successful

Empty TSV path in test configuration appears safe

The empty TSV path in the test configuration is safe because:

  1. The feature is explicitly deactivated (activate: false)
  2. Our attempts to find any test cases that activate this feature yielded no results
  3. This is a test configuration file, where empty values are commonly used as placeholders
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if any test cases activate the feature with empty TSV
rg -l "custom_alignment_properties:(\n|\s)+activate:\s*true" .test/

Length of output: 261


Script:

#!/bin/bash
# Check if any test cases activate the feature with empty TSV using multiline mode
rg -U "custom_alignment_properties:\s*\n\s*activate:\s*true" .test/

# Also check the main config for comparison
rg -U "custom_alignment_properties:\s*\n\s*activate:\s*true" config/

Length of output: 140


Script:

#!/bin/bash
# Check all test configurations for custom_alignment_properties section
rg -A 4 "custom_alignment_properties:" .test/

Length of output: 45


Script:

#!/bin/bash
# List all yaml files in test directory
fd -e yaml -e yml . .test/

# Show content of any yaml files found
fd -e yaml -e yml . .test/ -x cat {}

Length of output: 69

🧰 Tools
🪛 yamllint (1.35.1)

[error] 191-191: no new line character at the end of file

(new-line-at-end-of-file)

.test/config-giab/config.yaml (1)

161-164: Same issues as in other test config

This file has the same issues as identified in .test/config-chm-eval/config.yaml:

  1. Empty TSV path needs verification
  2. Missing newline at end of file
🧰 Tools
🪛 yamllint (1.35.1)

[error] 164-164: no new line character at the end of file

(new-line-at-end-of-file)

workflow/rules/common.smk (2)

115-120: LGTM! Well-structured utility function for feature flag checks.

The implementation is clean, safe, and follows best practices:

  • Uses dict.get() for safe key access
  • Handles nested config traversal correctly
  • Clear and descriptive function name

1612-1623: 🛠️ Refactor suggestion

Remove debug print and add error handling.

The function implementation looks good but has a few areas for improvement:

  1. Add error handling for missing configuration
  2. Validate alignment property column exists
 def get_alignment_props():
     def inner(wildcards):
         if is_activated("custom_alignment_properties"):
             alignment_prop_column = config["custom_alignment_properties"]["column"]
+            if not alignment_prop_column:
+                raise ValueError("custom_alignment_properties.column must be set")
             prop_name = extract_unique_sample_column_value(
                 wildcards.sample, alignment_prop_column
             )
             if pd.notna(prop_name):
+                if prop_name not in custom_alignment_props.index:
+                    raise ValueError(f"Property '{prop_name}' not found in alignment properties TSV")
                 return custom_alignment_props.loc[prop_name, "path"]
         return "results/alignment-properties/{group}/{sample}.json"
     return inner

.test/config-sra/config.yaml Show resolved Hide resolved
workflow/rules/common.smk Show resolved Hide resolved
workflow/rules/common.smk Outdated Show resolved Hide resolved
custom_alignment_props = (
(
pd.read_csv(
config["custom_alignment_properties"]["tsv"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this fail if there is no global tsv?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I would expect the user to define one in case this option has been activated.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
workflow/rules/common.smk (1)

122-135: Consider adding validation for the TSV structure.

While the configuration schema ensures file presence, consider validating the structure of the alignment properties TSV file.

Consider adding validation for:

  1. Required columns
  2. Data types
  3. File format
 custom_alignment_props = (
     (
         pd.read_csv(
             config["custom_alignment_properties"]["tsv"],
             sep="\t",
             dtype={"name": str, "path": str},
             comment="#",
         )
+        .assign(path=lambda df: df["path"].apply(os.path.expanduser))
         .set_index(["name"], drop=False)
         .sort_index()
     )
     if is_activated("custom_alignment_properties")
     else None
 )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 765d099 and df4d185.

📒 Files selected for processing (4)
  • workflow/rules/calling.smk (1 hunks)
  • workflow/rules/common.smk (2 hunks)
  • workflow/rules/mapping.smk (1 hunks)
  • workflow/rules/ref.smk (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • workflow/rules/calling.smk
🧰 Additional context used
📓 Learnings (1)
workflow/rules/common.smk (1)
Learnt from: johanneskoester
PR: snakemake-workflows/dna-seq-varlociraptor#346
File: workflow/rules/common.smk:122-135
Timestamp: 2024-12-13T08:46:33.627Z
Learning: In `workflow/rules/common.smk`, explicit error handling for reading TSV files (e.g., using `pd.read_csv`) is unnecessary when the files are specified in the configuration, as the configuration schema ensures their presence and correctness.
🔇 Additional comments (6)
workflow/rules/ref.smk (3)

167-167: Consider the implications of making contig files permanent.

The removal of temp() directive for contig files combined with caching could improve performance for repeated runs but increases storage requirements.

Run this script to check if these files are referenced elsewhere:

Also applies to: 172-172


153-153: Consider the implications of making pangenome files permanent.

The removal of temp() directive means the VCF file will be kept permanently. While this improves reproducibility and allows for manual inspection, it may increase disk space usage.

Run this script to check the typical size of these files:

Also applies to: 158-158

✅ Verification successful

Keeping pangenome VCF files is justified despite storage costs

The pangenome VCF file contains reference haplotype data that may be needed for result interpretation and validation. While these files can be large, re-downloading them would be time-consuming and bandwidth-intensive, especially since they are used in multiple downstream steps (renaming and indexing).

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check size of pangenome VCF files
find . -name "*.vcf.gz" -type f -exec du -h {} \;

Length of output: 54


Script:

#!/bin/bash
# Check for Snakefile and documentation
echo "=== Checking for Snakefile and docs ==="
fd -e md
fd Snakefile

echo "=== Checking ref.smk context ==="
cat workflow/rules/ref.smk

Length of output: 5439


184-184: Consider the implications of making renamed VCF files permanent.

The removal of temp() directive for renamed VCF files combined with caching suggests these files are needed for downstream analysis or debugging.

Run this script to verify the usage:

Also applies to: 187-187

✅ Verification successful

The permanent storage of renamed VCF files is justified

The renamed VCF files serve as crucial intermediate files in the workflow:

  • Used conditionally by the vg autoindex process when chromosome renaming is configured
  • Part of the workflow's caching strategy for optimization
  • Valuable for debugging chromosome renaming issues
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check references to renamed VCF files
rg "renamed.vcf.gz" --type snakemake

Length of output: 70


Script:

#!/bin/bash
# Search for renamed VCF files references
rg "renamed.vcf.gz"

# Search for potential rule definitions in Snakemake files
fd -e smk -e snakefile -I Snakefile --exec cat {}

# Look for caching configurations
rg -i "cache|temp.*directive"

Length of output: 1529


Script:

#!/bin/bash
# Get context around renamed.vcf.gz references
rg -B 5 -A 5 "renamed.vcf.gz"

# Look for potential input patterns that might match renamed VCF files
rg "input.*vcf" workflow/rules/

Length of output: 1167

workflow/rules/mapping.smk (1)

32-32: Verify system capacity for increased thread count.

The thread count increase from 8 to 64 is significant. While this can improve mapping performance, it may:

  1. Exceed available system resources
  2. Impact concurrent job execution
  3. Not scale linearly with thread count

Consider monitoring system performance with this increased thread count. Run this script to check other thread usage in the workflow:

✅ Verification successful

Thread allocation for vg/giraffe is appropriate

The 64-thread allocation for vg/giraffe aligns with the workflow's resource management strategy:

  • Other computationally intensive operations also receive high thread counts
  • Most standard operations (BWA, samtools) use 8 threads
  • Lighter operations use 4 threads or less
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check thread usage across workflow
rg "threads:" --type snakemake -A 1

Length of output: 69


Script:

#!/bin/bash
# Find Snakemake files and search for thread configurations
fd -e smk -e snakefile . | xargs rg "threads:" -A 2 -B 2

Length of output: 6253

workflow/rules/common.smk (2)

115-120: LGTM! Well-structured utility function.

The is_activated() function provides a clean way to check feature activation in nested configurations.


1612-1620: Consider simplifying the function structure.

The function could be simplified by removing the nested structure, as suggested in the past review comments.

Consider this structure:

-def get_alignment_props(wildcards):
-    if is_activated("custom_alignment_properties"):
-        alignment_prop_column = config["custom_alignment_properties"]["column"]
-        prop_name = extract_unique_sample_column_value(
-            wildcards.sample, alignment_prop_column
-        )
-        if pd.notna(prop_name):
-            return custom_alignment_props.loc[prop_name, "path"]
-    return f"results/alignment-properties/{wildcards.group}/{wildcards.sample}.json"
+def get_alignment_props(wildcards):
+    if not is_activated("custom_alignment_properties"):
+        return f"results/alignment-properties/{wildcards.group}/{wildcards.sample}.json"
+    
+    alignment_prop_column = config["custom_alignment_properties"]["column"]
+    prop_name = extract_unique_sample_column_value(
+        wildcards.sample, alignment_prop_column
+    )
+    return (
+        custom_alignment_props.loc[prop_name, "path"]
+        if pd.notna(prop_name)
+        else f"results/alignment-properties/{wildcards.group}/{wildcards.sample}.json"
+    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants