Pacbio align #63

gq1 · 2023-11-30T12:38:52Z

PR checklist

This comment contains a description of changes (with reason).
If you've fixed a bug or added code that should be tested, add tests!
If you've added a new tool - have you followed the pipeline conventions in the contribution docs
Make sure your code lints (nf-core lint).
Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
Usage Documentation in docs/usage.md is updated.
Output Documentation in docs/output.md is updated.
CHANGELOG.md is updated.
README.md is updated (including new tool citations and authors/contributors).

…093b313e29ed68480d81d796cc1609536518ee5a and install all the related modules.

github-actions · 2023-11-30T12:40:26Z

`nf-core lint` overall result: Passed ✅

Posted for pipeline commit 45a7ba4

+| ✅ 126 tests passed       |+
#| ❔  26 tests were ignored |#

❔ Tests ignored:

files_exist - File is ignored: CODE_OF_CONDUCT.md
files_exist - File is ignored: assets/nf-core-variantcalling_logo_light.png
files_exist - File is ignored: docs/images/nf-core-variantcalling_logo_light.png
files_exist - File is ignored: docs/images/nf-core-variantcalling_logo_dark.png
files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
files_exist - File is ignored: .github/workflows/awstest.yml
files_exist - File is ignored: .github/workflows/awsfulltest.yml
files_exist - File is ignored: assets/multiqc_config.yml
files_exist - File is ignored: conf/igenomes.config
nextflow_config - Config variable ignored: manifest.name
nextflow_config - Config variable ignored: manifest.homePage
files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
files_unchanged - File does not exist: .github/ISSUE_TEMPLATE/config.yml
files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
files_unchanged - File ignored due to lint config: assets/email_template.html
files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
files_unchanged - File does not exist: assets/nf-core-variantcalling_logo_light.png
files_unchanged - File does not exist: docs/images/nf-core-variantcalling_logo_light.png
files_unchanged - File does not exist: docs/images/nf-core-variantcalling_logo_dark.png
files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy
actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/variantcalling/variantcalling/.github/workflows/awstest.yml
multiqc_config - 'assets/multiqc_config.yml' not found

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: lib/nfcore_external_java_deps.jar
files_exist - File found: lib/NfcoreSchema.groovy
files_exist - File found: lib/NfcoreTemplate.groovy
files_exist - File found: lib/Utils.groovy
files_exist - File found: lib/WorkflowMain.groovy
files_exist - File found: main.nf
files_exist - File found: conf/base.config
files_exist - File found: lib/WorkflowVariantcalling.groovy
files_exist - File found: modules.json
files_exist - File found: pyproject.toml
files_exist - File not found check: Singularity
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: docs/images/nf-core-variantcalling_logo.png
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.show_hidden_params
nextflow_config - Config variable found: params.schema_ignore_params
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: '1.1.0-dev'
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - lib/nfcore_external_java_deps.jar matches the template
files_unchanged - lib/NfcoreSchema.groovy matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
files_unchanged - pyproject.toml matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
readme - README Nextflow minimum version badge matched config. Badge: 22.10.1, Config: 22.10.1
readme - README Zenodo placeholder was replaced with DOI.
pipeline_todos - No TODO strings found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (147 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: sanger_test_full.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: sanger_test.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'

Run details

nf-core/tools version 2.8
Run at 2023-12-07 16:24:48

gq1 · 2023-12-01T08:53:46Z

The profile test_full_aln finished on the farm:

Completed at: 30-Nov-2023 21:33:10
Duration    : 9h 4m 3s
CPU hours   : 242.2
Succeeded   : 76

gq1 · 2023-12-04T08:55:35Z

#48

…file being combined.

muffato

Some alignment files have .fasta in the name, like GCA_947369205.1_OX376310.1_CANBKR010000003.1.fasta.pacbio.icCanRufa1_combined.cram. I guess it's because .baseName only removes the .gz extension. Can you do something like fai.baseName - '.fasta' (as in subworkflows/local/input_filter_split.nf) ?
Not from this PR, but since we're talking about the naming convention, I would prefer not adding _combined
Bump maxRetries to 5 in conf/base.config, until the resource requirements are backported from the readmapping pipeline

Thinking about the pipeline doing either the alignment sub-workflow or "samtools merge". There is actually a "samtools merge" in the alignment sub-workflow that does pretty much the same thing: align multiple aligned files from the same sample. Could there be just one "samtools merge" that works in both cases ? Also, in the alignment sub-workflow, "samtools merge" is followed by a bunch of samtools commands to extract some statistics and convert to CRAM. Here are my thoughts:

For resource optimisation, we'll very likely have to run "samtools flagstat" on every input file to tune CPU/memory according to the read count
It's maybe not a bad thing to extract all those stats from aligned files given as inputs. It may help with debugging and making sure the files are proper
Is CRAM conversion happening twice ? There seems to be one in convert_stats.nf and one in input_filter_split.nf. Maybe we can remove the first one ?

In summary, and I'm happy to discuss that over a call if that's easier, here is what I would propose:

Stop the alignment sub-workflow after minimap.
Run samtools merge at this stage in both alignment and pre-aligned modes.
The "convert stats" sub-workflow could be just "stats" and run samtools stats/flagstat/idxstat on all aligned files before input_filter_split.

nextflow_schema.json

wrong default value Co-authored-by: Matthieu Muffato <[email protected]>

gq1 · 2023-12-07T16:03:41Z

Some alignment files have .fasta in the name, like GCA_947369205.1_OX376310.1_CANBKR010000003.1.fasta.pacbio.icCanRufa1_combined.cram. I guess it's because .baseName only removes the .gz extension. Can you do something like fai.baseName - '.fasta' (as in subworkflows/local/input_filter_split.nf) ?

Not from this PR, but since we're talking about the naming convention, I would prefer not adding _combined

Bump maxRetries to 5 in conf/base.config, until the resource requirements are backported from the readmapping pipeline

All being updated now.

gq1 · 2023-12-07T16:14:10Z

Currently we tried to import 3 subworkflows from readmapping pipeline and didn't change much.
In the future if we can push these subworkflows to nf-core, we can directly import them.
That is why we don't need to change much for now, keep these subworkflows same way as in other pipelines (3 now?), maybe easy to maintain for now?

Thinking about the pipeline doing either the alignment sub-workflow or "samtools merge". There is actually a "samtools merge" in the alignment sub-workflow that does pretty much the same thing: align multiple aligned files from the same sample. Could there be just one "samtools merge" that works in both cases ? Also, in the alignment sub-workflow, "samtools merge" is followed by a bunch of samtools commands to extract some statistics and convert to CRAM. Here are my thoughts:

For resource optimisation, we'll very likely have to run "samtools flagstat" on every input file to tune CPU/memory according to the read count

It's maybe not a bad thing to extract all those stats from aligned files given as inputs. It may help with debugging and making sure the files are proper

I suppose the aligned input bam/cram file will come with the stats file normally.

Is CRAM conversion happening twice ? There seems to be one in convert_stats.nf and one in input_filter_split.nf. Maybe we can remove the first one ?

The first process only convert the aligned reads to CRAM and the second one will filter the reads as well.
We can skip the first one if we don't want to keep the original aligned reads.

gq1 · 2023-12-08T08:47:55Z

The latest results here from nextflow run variantcalling -profile test_full_align,sanger,singularity --align

/nfs/users/nfs_g/gq2/lustre123_gq2/git/results

muffato · 2023-12-08T16:08:49Z

We discussed this this morning.

Contrary to the blobtoolkit pipeline, the alignments done here are the best we can. So they're worth being exposed to the results directory in CRAM. Let's keep that SAMTOOLS_VIEW
Sub-workflows will be reviewed and deposited in nf-core next year. In the meantime, let's keep the ones you copied from readmapping

muffato · 2023-12-08T22:39:26Z

3. The "convert stats" sub-workflow could be just "stats" and run samtools stats/flagstat/idxstat on all aligned files before input_filter_split.

FYI. Priyanka had the same idea for the read-mapping pipeline: sanger-tol/readmapping#63

gq1 added 25 commits November 17, 2023 12:37

copy align_pacbio subworkflow from readmapping pipleline dev branch, …

45de788

…093b313e29ed68480d81d796cc1609536518ee5a and install all the related modules.

add align option for pacbio_align subworkflow

a5b1d96

re-patch samtools view

4c8367e

nf-core modules update vcftools

b4f582d

update the way calling module or subworkflow

f2cefaf

add a new test_align profile

27e5d9e

make samtools convert output as bam

59892b0

use compressed fasta file

eef4d82

extra modules configs

9347f34

remove -b flag for samtools view

2d70af5

add RG group to meta data in sample checking

f7d5faf

update channel with meta

ce47ad9

publish aligned cram files with stats

80e83c7

Remove samtools sort after minimap_align and before merging.

8e2ccc8

add conditions pacbio align modules configs

2815232

pass the aligned reads to variant calling

7125e18

remove unused module config

adcda19

no need to sort merge after aligment

6b54545

add two test sample files for alignments

e03a4ee

convert fasta channel to value channel

1540017

put _T1 back to distinguish the same samles

8463d43

add combined in the aligned bam/cram file name if sample being combined

50df394

add a full test with aligment

add2bcb

add one alignment test in the simple ci test

4285d4c

black check

ef82d49

gq1 added 2 commits November 30, 2023 13:16

EditorConfig linting

6bf3a19

make the sanger farm test with alignment

006eced

gq1 requested review from priyanka-surana and muffato November 30, 2023 13:28

Update module 'samtools/fasta

9a0003a

remove the combined in the output file name if the same sample input …

4db4cbc

…file being combined.

muffato linked an issue Dec 7, 2023 that may be closed by this pull request

Add optional read mapping subworkflow #48

Closed

muffato assigned gq1 Dec 7, 2023

update file name for VCf output files

625f649

muffato reviewed Dec 7, 2023

View reviewed changes

nextflow_schema.json Outdated Show resolved Hide resolved

gq1 and others added 3 commits December 7, 2023 15:33

make sure fasta or fa being removed from the fasta file name

d348591

Update nextflow_schema.json

7d20b1d

wrong default value Co-authored-by: Matthieu Muffato <[email protected]>

change maxRetries to 5.

45a7ba4

muffato approved these changes Dec 8, 2023

View reviewed changes

gq1 merged commit 5f528f7 into dev Dec 8, 2023
6 checks passed

muffato deleted the pacbio_align branch December 8, 2023 22:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pacbio align #63

Pacbio align #63

gq1 commented Nov 30, 2023

github-actions bot commented Nov 30, 2023 •

edited

Loading

❔ Tests ignored:

✅ Tests passed:

Run details

gq1 commented Dec 1, 2023

gq1 commented Dec 4, 2023

muffato left a comment

gq1 commented Dec 7, 2023 •

edited

Loading

gq1 commented Dec 7, 2023

gq1 commented Dec 8, 2023

muffato commented Dec 8, 2023

muffato commented Dec 8, 2023

Pacbio align #63

Pacbio align #63

Conversation

gq1 commented Nov 30, 2023

PR checklist

github-actions bot commented Nov 30, 2023 • edited Loading

nf-core lint overall result: Passed ✅

❔ Tests ignored:

✅ Tests passed:

Run details

gq1 commented Dec 1, 2023

gq1 commented Dec 4, 2023

muffato left a comment

Choose a reason for hiding this comment

gq1 commented Dec 7, 2023 • edited Loading

gq1 commented Dec 7, 2023

gq1 commented Dec 8, 2023

muffato commented Dec 8, 2023

muffato commented Dec 8, 2023

github-actions bot commented Nov 30, 2023 •

edited

Loading

`nf-core lint` overall result: Passed ✅

gq1 commented Dec 7, 2023 •

edited

Loading