Input Data to the FastMatch Pipeline #2

sgsutcliffe · 2024-12-10T21:18:43Z

Description:
We want to provide a query and reference sample selection to the FastMatch pipeline along with a selection of relevant metadata fields and parameters, so that users can obtain matched distances and context while avoiding unnecessary referencing of the entire database.

Acceptance Criteria:

The FastMatch pipeline accepts one or more query samples for running comparisons on
The FastMatch pipeline accepts one or more reference samples for running comparisons on
The FastMatch pipeline accepts the parameters set by the user within IRIDA Next:
1. Comparison score threshold value

PR checklist

This comment contains a description of changes (with reason).
If you've fixed a bug or added code that should be tested, add tests!
Make sure your code lints (nf-core lint).
Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
Check for unexpected warnings in debug mode (nextflow run . -profile ,test,docker --outdir <OUTDIR>).

github-actions · 2024-12-10T21:20:13Z

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit a70d030

+| ✅ 144 tests passed       |+
#| ❔  28 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

files_exist - File not found: conf/igenomes_ignored.config
nextflow_config - nf-validation has been detected in the pipeline. Please migrate to nf-schema: https://nextflow-io.github.io/nf-schema/latest/migration_guide/
nextflow_config - Config manifest.version should end in dev: 0.1.0
schema_lint - Schema $id should be https://raw.githubusercontent.com/phac-nml/fastmatchirida/master/nextflow_schema.json
Found https://raw.githubusercontent.com/phac-nml/fastmatchirida/main/nextflow_schema.json

❔ Tests ignored:

files_exist - File is ignored: assets/nf-core-fastmatchirida_logo_light.png
files_exist - File is ignored: docs/images/nf-core-fastmatchirida_logo_light.png
files_exist - File is ignored: docs/images/nf-core-fastmatchirida_logo_dark.png
files_exist - File is ignored: .github/workflows/awstest.yml
files_exist - File is ignored: .github/workflows/awsfulltest.yml
files_exist - File is ignored: lib/Utils.groovy
files_exist - File is ignored: lib/WorkflowMain.groovy
files_exist - File is ignored: lib/NfcoreTemplate.groovy
files_exist - File is ignored: lib/Workflowfastmatchirida.groovy
nextflow_config - Config variable ignored: manifest.name
nextflow_config - Config variable ignored: manifest.homePage
nextflow_config - Config variable ignored: params.max_cpus
files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
files_unchanged - File ignored due to lint config: assets/email_template.html
files_unchanged - File ignored due to lint config: assets/email_template.txt
files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
files_unchanged - File does not exist: assets/nf-core-fastmatchirida_logo_light.png
files_unchanged - File does not exist: docs/images/nf-core-fastmatchirida_logo_light.png
files_unchanged - File does not exist: docs/images/nf-core-fastmatchirida_logo_dark.png
files_unchanged - File ignored due to lint config: docs/README.md
files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/fastmatchirida/fastmatchirida/.github/workflows/awstest.yml
actions_awsfulltest - actions_awsfulltest
pipeline_name_conventions - pipeline_name_conventions

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-fastmatchirida_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowFastmatchirida.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Found nf-validation plugin
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.metadata_1_header= metadata_1
nextflow_config - Config default value correct: params.metadata_2_header= metadata_2
nextflow_config - Config default value correct: params.metadata_3_header= metadata_3
nextflow_config - Config default value correct: params.metadata_4_header= metadata_4
nextflow_config - Config default value correct: params.metadata_5_header= metadata_5
nextflow_config - Config default value correct: params.metadata_6_header= metadata_6
nextflow_config - Config default value correct: params.metadata_7_header= metadata_7
nextflow_config - Config default value correct: params.metadata_8_header= metadata_8
nextflow_config - Config default value correct: params.threshold= 1.0
nextflow_config - Config default value correct: params.pd_outfmt= matrix
nextflow_config - Config default value correct: params.pd_distm= hamming
nextflow_config - Config default value correct: params.pd_missing_threshold= 1.0
nextflow_config - Config default value correct: params.pd_sample_quality_threshold= 1.0
nextflow_config - Config default value correct: params.pd_file_type= text
nextflow_config - Config default value correct: params.max_cpus= 4
nextflow_config - Config default value correct: params.max_memory= 2.GB
nextflow_config - Config default value correct: params.max_time= 1.h
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.validate_params= true
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
readme - README Zenodo placeholder was replaced with DOI.
pipeline_todos - No TODO strings found
plugin_includes - No wrong validation plugin imports have been found
template_strings - Did not find any Jinja template strings (0 files)
schema_lint - Schema lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
base_config - CUSTOM_DUMPSOFTWAREVERSIONS found in conf/base.config and Nextflow scripts.
modules_config - conf/modules.config found and not ignored.
modules_config - INPUT_ASSURE found in conf/modules.config and Nextflow scripts.
modules_config - LOCIDEX_MERGE found in conf/modules.config and Nextflow scripts.
modules_config - PROFILE_DISTS found in conf/modules.config and Nextflow scripts.
modules_config - CUSTOM_DUMPSOFTWAREVERSIONS found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.0.1

Run details

nf-core/tools version 3.0.1
Run at 2024-12-12 16:09:15

…IDIA

apetkau

Thanks so much @sgsutcliffe. This is amazing (and fast) work 😄

I have some in-line comments below.

apetkau · 2024-12-11T20:37:01Z

nextflow_schema.json

+            "default": "",
+            "properties": {
+                "threshold": {
+                    "type": "number",


Could you set a "minimum" threshold of 0 here: https://nextflow-io.github.io/nf-validation/nextflow_schema/nextflow_schema_specification/#minimum-maximum

Good idea! 8de39dd

apetkau · 2024-12-11T20:44:00Z

assets/schema_input.json

@@ -25,6 +25,13 @@
                "pattern": "^\\S+\\.mlst(\\.subtyping)?\\.json(\\.gz)?$",
                "errorMessage": "MLST JSON file from locidex report, cannot contain spaces and must have the extension: '.mlst.json', '.mlst.json.gz', '.mlst.subtyping.json', or 'mlst.subtyping.json.gz'"
            },
+            "fastmatch_category": {


I think the specific behaviour of this column may need a bit further discussion later on, but is something we can leave for this PR (and likely for this sprint to get feedback from others).

Specifically, on the nextflow side, the data in this column is being moved into the meta object for each sample. However, we cannot use the keyword "meta" in this schema JSON file, since that is used by IRIDA Next to load data from the metadata table in IRIDA Next.

I think it would make most sense to actually use the "meta" keyword in this JSON file, but maybe change the behaviour of IRIDA Next somehow? Or, to allow loading of a metadata column OR user-entered values to set query/reference samples.

However, as this is a more complex use case it requires further discussion. So this is good as-is now. I just wanted to make a note here about this.

Based on the my question, the issue was sort of raised. Might be worth a formal discussion, I agree. I did not like my work around.

apetkau · 2024-12-11T20:52:14Z

workflows/fastmatchirida.nf

-                    + " Please either set '--pd_distm scaled' or remove fractions from distance thresholds.")
-        }
-    } else if (params.pd_distm == 'scaled') {
-        if (gm_thresholds_list.any { it != null && (it as Float < 0.0 || it as Float > 100.0) }) {


The purpose of these if-else statements in the gasclustering pipeline was to have some additional error checking on distance threshold values depending on the distance unit selected by the user. That is:

If scaled is selected, the threshold should be >= 0 and <= 100 (the threshold is a percent value).

The complexity of the if/else statements here was because we were passing a list of comma-separated thresholds as a string (e..g, "1,2,3").

For fastmatching, I think it would make sense to keep some of these checks, but they can be simplified (since the threshold is passed as a number instead of a string that needs to be parsed). Specifically:

If hamming is selected, the threshold is >= 0 (this would already be supported by adding the constraint in the schema_input.json file from another of my comments).

If scaled is selected, the threshold is >= 0 and <= 100 (since it's a percentage value).

What do you think?

If I understand correctly profile_dist will still output a percentage or integer based on pd_distm so the threshold cut-off will vary? I will implement this. I think it makes sense. Could be useful if user forgets to check between scaled and hamming.

Operational! a70d030

emarinier

Thanks Steven. Aaron covered a lot of things. I have just one small note, no further suggestions on changes.

emarinier · 2024-12-12T16:06:26Z

nextflow.config

@@ -43,6 +43,9 @@ params {
    validationShowHiddenParams       = false
    validate_params                  = true

+    // FastMatch
+    threshold = 1.0


This isn't a big thing, but I was thinking about how my output script is handling this and I was assuming it was an integer (hamming distances). We'll have to remember to accommodate both integers and floats with this.

I think it's if scaled is provided it can be a float. Am I correct @apetkau?

sgsutcliffe added 3 commits December 10, 2024 14:14

Limit iridanext output to fastmatch files

346311e

First attempt to modify IRIDA UI

1efe47c

Modified the UI to include query reference

aa4e0bd

sgsutcliffe added 4 commits December 10, 2024 16:21

Forgot to check prettier

dc91ff4

Modification to UI

6448399

Convert blank column entries of fastmatch_category to reference

b12fb5c

Make reference the default if left blank and fix drop down menu in IR…

fa831f7

…IDIA

sgsutcliffe requested review from apetkau and emarinier December 11, 2024 20:25

apetkau requested changes Dec 11, 2024

View reviewed changes

sgsutcliffe added 2 commits December 12, 2024 09:13

Set minimum for threshold to 0

8de39dd

Check scaled values in range between 0-100

a70d030

emarinier approved these changes Dec 12, 2024

View reviewed changes

sgsutcliffe requested a review from apetkau December 12, 2024 16:11

sgsutcliffe merged commit 25f22dd into dev Dec 12, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input Data to the FastMatch Pipeline #2

Input Data to the FastMatch Pipeline #2

sgsutcliffe commented Dec 10, 2024 •

edited

Loading

github-actions bot commented Dec 10, 2024 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

apetkau left a comment

apetkau Dec 11, 2024

sgsutcliffe Dec 12, 2024

apetkau Dec 11, 2024

sgsutcliffe Dec 12, 2024

apetkau Dec 11, 2024

sgsutcliffe Dec 12, 2024

sgsutcliffe Dec 12, 2024

emarinier left a comment

emarinier Dec 12, 2024

sgsutcliffe Dec 12, 2024

Input Data to the FastMatch Pipeline #2

Input Data to the FastMatch Pipeline #2

Conversation

sgsutcliffe commented Dec 10, 2024 • edited Loading

PR checklist

github-actions bot commented Dec 10, 2024 • edited Loading

nf-core pipelines lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

apetkau left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emarinier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgsutcliffe commented Dec 10, 2024 •

edited

Loading

github-actions bot commented Dec 10, 2024 •

edited

Loading

`nf-core pipelines lint` overall result: Passed ✅ ⚠️