Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional alignment subworkflow #83

Merged
merged 32 commits into from
Oct 18, 2023
Merged

Optional alignment subworkflow #83

merged 32 commits into from
Oct 18, 2023

Conversation

priyanka-surana
Copy link
Contributor

@priyanka-surana priyanka-surana commented Sep 17, 2023

Closes #79

This PR adds an optional alignment subworkflow. It replicates the one currently in BTK. It is not meant to be high quality alignments. Quick sub-optimal alignments are good enough for BTK and hence this approach with minimap is implemented. Please add suggestions for arguments to improve alignment with Minimap without introducing more extensive aligners like BWAmem.

I would especially like your input on bin/samplesheet.py, subworkflows/local/coverage_stats.nf, subworkflows/local/minimap_alignment.nf and workflows/blobtoolkit.nf

@github-actions
Copy link

github-actions bot commented Sep 17, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 378ab70

+| ✅ 129 tests passed       |+
#| ❔  18 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: conf/igenomes.config
  • files_exist - File not found: .github/workflows/awstest.yml
  • files_exist - File not found: .github/workflows/awsfulltest.yml
  • nextflow_config - Config manifest.version should end in dev: '0.2.0'

❔ Tests ignored:

  • files_exist - File is ignored: assets/nf-core-blobtoolkit_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-blobtoolkit_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-blobtoolkit_logo_dark.png
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File does not exist: .github/ISSUE_TEMPLATE/config.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
  • files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
  • files_unchanged - File does not exist: assets/nf-core-blobtoolkit_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-blobtoolkit_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-blobtoolkit_logo_dark.png
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore or pyproject.toml
  • actions_ci - actions_ci
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/blobtoolkit/blobtoolkit/.github/workflows/awstest.yml

✅ Tests passed:

Run details

  • nf-core/tools version 2.8
  • Run at 2023-10-18 09:19:24

@priyanka-surana priyanka-surana self-assigned this Sep 17, 2023
@priyanka-surana priyanka-surana added the enhancement Improvement of the existing features label Sep 17, 2023
@priyanka-surana priyanka-surana added this to the 0.2.0 milestone Sep 17, 2023
@priyanka-surana priyanka-surana linked an issue Sep 18, 2023 that may be closed by this pull request
This was referenced Sep 18, 2023
Copy link
Member

@muffato muffato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. You're missing align = true in conf/test_raw.config
  2. I think it's confusing that the input channel in subworkflows/local/coverage_stats.nf has different shapes depending on a parameter. I would suggest that you move the SAMTOOLS_VIEW that creates the CSI index to workflows/blobtoolkit.nf (in the else of if (params.align)) so that ch_aligned is always the BAM + CSI
  3. There might be a problem with bin/samplesheet.py: by allowing *.bam there might be a clash during SAMTOOLS_VIEW because the input and output files could be named the same ? Maybe the else I was referring to in my previous point could check the file extension. If a BAM, then samtools index. If a CRAM, then samtools view ?
  4. You need to add pacbio_clr as an allowed datatype in bin/samplesheet.py

@priyanka-surana
Copy link
Contributor Author

  1. Added
  2. Fixed in an alternative way. Removed index from minimap alignment subworkflow.
  3. SAMTOOLS_VIEW would output BAM back without any changes and with index so should not be a problem but I have added the index step.
  4. Added

These tests ran on the farm without errors using nf-core_2.8

nextflow run . -profile test,singularity -ansi-log false
nextflow run . -profile test_raw,singularity -ansi-log false
bsub -R 'select[mem>4000] rusage[mem=4000] span[hosts=1]' -n 2 -M 4000 -J btk_test_full -o btk.%J.o -e btk.%J.e "nextflow run . -profile test_full,singularity,sanger -ansi-log false"

Copy link
Collaborator

@alxndrdiaz alxndrdiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Snakemake version of the minimap2 alignment the input are reads in FASTQ format while in this case CRAM/BAM files are transformed to FASTA and aligned again using minimap2: What is the advantage for an user in doing so instead of simply using the input alignment file? Only wanted to clarify why this is useful.

The following test worked as expected with Nextflow 22.10.1:

nextflow run . -profile test_raw,singularity -ansi-log false

@priyanka-surana
Copy link
Contributor Author

In the Snakemake version of the minimap2 alignment the input are reads in FASTQ format while in this case CRAM/BAM files are transformed to FASTA and aligned again using minimap2: What is the advantage for an user in doing so instead of simply using the input alignment file? Only wanted to clarify why this is useful.

The following test worked as expected with Nextflow 22.10.1:

nextflow run . -profile test_raw,singularity -ansi-log false

Unaligned files can also be in CRAM, BAM format. So if the user provides that and the align flag, now we can align it here and run the rest of the steps. Earlier, the user was required to provide aligned files. This is a step in making this pipeline independent. I can look at adding support for FASTQ as well, but after we sort out the config file.

@priyanka-surana priyanka-surana merged commit bb01d13 into release Oct 18, 2023
6 checks passed
@priyanka-surana priyanka-surana deleted the alignment branch October 18, 2023 09:42
@priyanka-surana priyanka-surana restored the alignment branch December 16, 2023 18:52
@priyanka-surana priyanka-surana deleted the alignment branch December 16, 2023 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement of the existing features
Projects
Development

Successfully merging this pull request may close these issues.

Add optional read mapping subworkflow
4 participants