diff --git a/.editorconfig b/.editorconfig index 44cb1a23..c1af9c8a 100644 --- a/.editorconfig +++ b/.editorconfig @@ -49,3 +49,4 @@ indent_style = unset # ignore perl [*.{pl,pm}] indent_size = unset +indent_style = unset diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 79b1aa06..e40146fe 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -16,7 +16,7 @@ If you'd like to write some code for plant-food-research-open/assemblyqc, the st 1. Check that there isn't already an issue about your idea in the [plant-food-research-open/assemblyqc issues](https://github.com/plant-food-research-open/assemblyqc/issues) to avoid duplicating work. If there isn't one already, please create one so that others know you're working on this 2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [plant-food-research-open/assemblyqc repository](https://github.com/plant-food-research-open/assemblyqc) to your GitHub account 3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions) -4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). +4. Use `nf-core pipelines schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). 5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). @@ -37,7 +37,7 @@ There are typically two types of tests that run: ### Lint tests `nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. -To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. +To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core pipelines lint ` command. If any failures or warnings are encountered, please follow the listed URL for more documentation. @@ -68,21 +68,21 @@ If you wish to contribute a new step, please use the following coding standards: 2. Write the process block (see below). 3. Define the output channel if needed (see below). 4. Add any new parameters to `nextflow.config` with a default (see below). -5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core schema build` tool). +5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core pipelines schema build` tool). 6. Add sanity checks and validation for all relevant parameters. 7. Perform local tests to validate that the new code works as expected. 8. If applicable, add a new test command in `.github/workflow/ci.yml`. -9. Add a description of the output files and if relevant any appropriate images from the MultiQC report to `docs/output.md`. +9. Add a description of the output files and if relevant any appropriate images from the report to `docs/output.md`. ### Default values Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope. -Once there, use `nf-core schema build` to add to `nextflow_schema.json`. +Once there, use `nf-core pipelines schema build` to add to `nextflow_schema.json`. ### Default processes resource requirements -Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. +Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/main/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. The process resources can be passed on to the tool dynamically within the process with the `${task.cpus}` and `${task.memory}` variables in the `script:` block. @@ -95,7 +95,7 @@ Please use the following naming schemes, to make it easy to understand what is g ### Nextflow version bumping -If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]` +If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core pipelines bump-version --nextflow . [min-nf-version]` ### Images and figures diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml index af436aa6..e3135746 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.yml +++ b/.github/ISSUE_TEMPLATE/bug_report.yml @@ -9,46 +9,34 @@ body: description: A clear and concise description of what the bug is. validations: required: true + - type: textarea id: command_used attributes: label: Command used and terminal output - description: Steps to reproduce the behaviour. Please paste the command you used - to launch the pipeline and the output from your terminal. + description: Steps to reproduce the behaviour. Please paste the command you used to launch the pipeline and the output from your terminal. render: console - placeholder: "$ nextflow run ... - + placeholder: | + $ nextflow run ... Some output where something broke - " - type: textarea id: files attributes: label: Relevant files - description: "Please drag and drop the relevant files here. Create a `.zip` archive - if the extension is not allowed. - - Your verbose log file `.nextflow.log` is often useful _(this is a hidden file - in the directory where you launched the pipeline)_ as well as custom Nextflow - configuration files. + description: | + Please drag and drop the relevant files here. Create a `.zip` archive if the extension is not allowed. + Your verbose log file `.nextflow.log` is often useful _(this is a hidden file in the directory where you launched the pipeline)_ as well as custom Nextflow configuration files. - " - type: textarea id: system attributes: label: System information - description: "* Nextflow version _(eg. 23.04.0)_ - + description: | + * Nextflow version _(eg. 23.04.0)_ * Hardware _(eg. HPC, Desktop, Cloud)_ - * Executor _(eg. slurm, local, awsbatch)_ - - * Container engine: _(e.g. Docker, Singularity, Conda, Podman, Shifter, Charliecloud, - or Apptainer)_ - + * Container engine: _(e.g. Docker, Singularity, Conda, Podman, Shifter, Charliecloud, or Apptainer)_ * OS _(eg. CentOS Linux, macOS, Linux Mint)_ - * Version of plant-food-research-open/assemblyqc _(eg. 1.1, 1.5, 1.8.2)_ - - " diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 0cd29f73..3104f8b2 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -16,7 +16,7 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/plant-food-r - [ ] This comment contains a description of changes (with reason). - [ ] If you've fixed a bug or added code that should be tested, add tests! - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/plant-food-research-open/assemblyqc/tree/main/.github/CONTRIBUTING.md) -- [ ] Make sure your code lints (`nf-core lint`). +- [ ] Make sure your code lints (`nf-core pipelines lint`). - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker --outdir `). - [ ] Check for unexpected warnings in debug mode (`nextflow run . -profile debug,test,docker --outdir `). - [ ] Usage Documentation in `docs/usage.md` is updated. diff --git a/.github/include.yaml b/.github/include.yaml new file mode 100644 index 00000000..5d850e13 --- /dev/null +++ b/.github/include.yaml @@ -0,0 +1,10 @@ +".": + - ./.github/workflows/** + - ./nf-test.config +tests: + - ./assets/* + - ./bin/* + - ./conf/* + - ./main.nf + - ./nextflow_schema.json + - ./nextflow.config diff --git a/.github/version_checks.sh b/.github/version_checks.sh index 23cf619c..3acfa8d6 100755 --- a/.github/version_checks.sh +++ b/.github/version_checks.sh @@ -1,5 +1,7 @@ #!/usr/bin/env bash +set -euo pipefail + config_version=$(sed -n "/^\s*version\s*=\s*'/s/version//p" nextflow.config | tr -d "=[:space:]'") cff_version=$(sed -n '/^version: /s/version: //p' CITATION.cff | tr -d '[:space:]') @@ -12,3 +14,8 @@ fi head -10 CHANGELOG.md | grep "## v$config_version - " >/dev/null \ || (echo 'Failed to match CHANGELOG version'; exit 1) + +# Check .nf-core.yml version + +tail -5 .nf-core.yml | grep "version: $config_version" >/dev/null \ + || (echo 'Failed to match .nf-core.yml version'; exit 1) diff --git a/.github/workflows/branch.yml b/.github/workflows/branch.yml index c5f090a7..81a9eaac 100644 --- a/.github/workflows/branch.yml +++ b/.github/workflows/branch.yml @@ -11,7 +11,7 @@ jobs: steps: # PRs to the nf-core repo main branch are only ok if coming from the nf-core repo `dev` or any `patch` branches - name: Check PRs - if: github.repository == 'plant-food-research-open/assemblyqc' + if: github.repository == 'Plant-Food-Research-Open/assemblyqc' run: | { [[ ${{ github.event.pull_request.head.repo.full_name }} == Plant-Food-Research-Open/assemblyqc ]] && [[ $GITHUB_HEAD_REF == "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]] diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 9c0a86ca..0b4c607d 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -1,88 +1,100 @@ name: nf-core CI # This workflow runs the pipeline with the minimal test dataset to check that it completes without any syntax errors on: - push: - branches: - - dev pull_request: - release: - types: [published] env: NXF_ANSI_LOG: false + NFT_WORKDIR: "~" + NFT_DIFF: "pdiff" + NFT_DIFF_ARGS: "--line-numbers --expand-tabs=2" concurrency: group: "${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}" cancel-in-progress: true - jobs: - test: - name: Run pipeline with test data - # Only run on push if this is the nf-core dev branch (merged PRs) - if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'plant-food-research-open/assemblyqc') }}" + nf-test-changes: + name: Check for changes runs-on: ubuntu-latest - strategy: - matrix: - NXF_VER: - - "23.04.0" - TEST_PARAMS: - - minimal - - invalid - - stub - - noltr - - hicparam - include: - - OPTION_STUB: "" - - OPTION_STUB: "-stub" - TEST_PARAMS: stub - - OPTION_STUB: "-stub" - TEST_PARAMS: hicparam + outputs: + nf_test_files: ${{ steps.list.outputs.components }} steps: - - name: Check out pipeline code - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 - - - name: Install Nextflow - uses: nf-core/setup-nextflow@v2 + - uses: actions/checkout@v4.2.1 with: - version: ${{ matrix.NXF_VER }} + fetch-depth: 0 - - name: Disk space cleanup - uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1 + - name: List nf-test files + id: list + uses: adamrtalbot/detect-nf-test-changes@v0.0.4 + with: + head: ${{ github.sha }} + base: origin/${{ github.base_ref }} + include: .github/include.yaml - - name: Run pipeline with test data + - name: print list of nf-test files run: | - nextflow run \ - ${GITHUB_WORKSPACE} \ - --outdir ./results \ - -profile docker \ - -params-file \ - ./tests/${{ matrix.TEST_PARAMS }}/params.json \ - ${{ matrix.OPTION_STUB }} + echo ${{ steps.list.outputs.components }} - nf-test: - name: Run nf-tests - # Only run on push if this is the nf-core dev branch (merged PRs) - if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'plant-food-research-open/assemblyqc') }}" + test: + name: ${{ matrix.nf_test_files }} ${{ matrix.profile }} NF-${{ matrix.NXF_VER }} + needs: [nf-test-changes] + if: needs.nf-test-changes.outputs.nf_test_files != '[]' runs-on: ubuntu-latest strategy: + fail-fast: false matrix: NXF_VER: - - "23.04.0" + - "24.04.2" + + nf_test_files: ["${{ fromJson(needs.nf-test-changes.outputs.nf_test_files) }}"] + profile: + - "docker" + steps: - name: Check out pipeline code - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 + uses: actions/checkout@v4.2.1 - name: Install Nextflow uses: nf-core/setup-nextflow@v2 with: - version: ${{ matrix.NXF_VER }} + version: "${{ matrix.NXF_VER }}" + + - uses: actions/setup-python@v5.2.0 + with: + python-version: "3.11" + architecture: "x64" + + - name: Install pdiff to see diff between nf-test snapshots + run: | + python -m pip install --upgrade pip + pip install pdiff + + - uses: nf-core/setup-nf-test@v1.1.2 + with: + version: 0.9.0 - - name: Install nf-test - uses: nf-core/setup-nf-test@v1.1.2 + - name: Disk space cleanup + if: matrix.nf_test_files == 'tests/stub/main.nf.test' + uses: jlumbroso/free-disk-space@v1.3.1 + + - name: Run nf-test + run: | + nf-test test --verbose ${{ matrix.nf_test_files }} --profile "+${{ matrix.profile }}" + + confirm-pass: + runs-on: ubuntu-latest + needs: [test] + if: always() + steps: + - name: All tests ok + if: ${{ !contains(needs.*.result, 'failure') }} + run: exit 0 + - name: One or more tests failed + if: ${{ contains(needs.*.result, 'failure') }} + run: exit 1 - - name: Run nf-tests + - name: debug-print + if: always() run: | - nf-test \ - test \ - --verbose \ - tests + echo "toJSON(needs) = ${{ toJSON(needs) }}" + echo "toJSON(needs.*.result) = ${{ toJSON(needs.*.result) }}" diff --git a/.github/workflows/clean-up.yml b/.github/workflows/clean-up.yml index 0b6b1f27..53e721c7 100644 --- a/.github/workflows/clean-up.yml +++ b/.github/workflows/clean-up.yml @@ -12,9 +12,9 @@ jobs: steps: - uses: actions/stale@28ca1036281a5e5922ead5184a1bbf96e5fc984e # v9 with: - stale-issue-message: "This issue has been tagged as awaiting-changes or awaiting-feedback by an nf-core contributor. Remove stale label or add a comment otherwise this issue will be closed in 20 days." - stale-pr-message: "This PR has been tagged as awaiting-changes or awaiting-feedback by an nf-core contributor. Remove stale label or add a comment if it is still useful." - close-issue-message: "This issue was closed because it has been tagged as awaiting-changes or awaiting-feedback by an nf-core contributor and then staled for 20 days with no activity." + stale-issue-message: "This issue has been tagged as awaiting-changes or awaiting-feedback. Remove stale label or add a comment otherwise this issue will be closed in 20 days." + stale-pr-message: "This PR has been tagged as awaiting-changes or awaiting-feedback. Remove stale label or add a comment if it is still useful." + close-issue-message: "This issue was closed because it has been tagged as awaiting-changes or awaiting-feedback and then staled for 20 days with no activity." days-before-stale: 30 days-before-close: 20 days-before-pr-close: -1 diff --git a/.github/workflows/download_pipeline.yml b/.github/workflows/download_pipeline.yml index 4a2e3eb4..7cc2c387 100644 --- a/.github/workflows/download_pipeline.yml +++ b/.github/workflows/download_pipeline.yml @@ -1,4 +1,4 @@ -name: Test successful pipeline download with 'nf-core download' +name: Test successful pipeline download with 'nf-core pipelines download' # Run the workflow when: # - dispatched manually @@ -8,7 +8,7 @@ on: workflow_dispatch: inputs: testbranch: - description: "The specific branch you wish to utilize for the test execution of nf-core download." + description: "The specific branch you wish to utilize for the test execution of nf-core pipelines download." required: true default: "dev" pull_request: @@ -39,9 +39,11 @@ jobs: with: python-version: "3.12" architecture: "x64" - - uses: eWaterCycle/setup-singularity@931d4e31109e875b13309ae1d07c70ca8fbc8537 # v7 + + - name: Setup Apptainer + uses: eWaterCycle/setup-apptainer@4bb22c52d4f63406c49e94c804632975787312b3 # v2.0.0 with: - singularity-version: 3.8.3 + apptainer-version: 1.3.4 - name: Install dependencies run: | @@ -54,33 +56,64 @@ jobs: echo "REPOTITLE_LOWERCASE=$(basename ${GITHUB_REPOSITORY,,})" >> ${GITHUB_ENV} echo "REPO_BRANCH=${{ github.event.inputs.testbranch || 'dev' }}" >> ${GITHUB_ENV} + - name: Make a cache directory for the container images + run: | + mkdir -p ./singularity_container_images + - name: Download the pipeline env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images run: | - nf-core download ${{ env.REPO_LOWERCASE }} \ + nf-core pipelines download ${{ env.REPO_LOWERCASE }} \ --revision ${{ env.REPO_BRANCH }} \ --outdir ./${{ env.REPOTITLE_LOWERCASE }} \ --compress "none" \ --container-system 'singularity' \ - --container-library "quay.io" -l "docker.io" -l "ghcr.io" \ + --container-library "quay.io" -l "docker.io" -l "community.wave.seqera.io" \ --container-cache-utilisation 'amend' \ - --download-configuration + --download-configuration 'yes' - name: Inspect download run: tree ./${{ env.REPOTITLE_LOWERCASE }} + - name: Count the downloaded number of container images + id: count_initial + run: | + image_count=$(ls -1 ./singularity_container_images | wc -l | xargs) + echo "Initial container image count: $image_count" + echo "IMAGE_COUNT_INITIAL=$image_count" >> ${GITHUB_ENV} + - name: Run the downloaded pipeline (stub) id: stub_run_pipeline continue-on-error: true env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images NXF_SINGULARITY_HOME_MOUNT: true run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -stub -profile test,singularity --outdir ./results - name: Run the downloaded pipeline (stub run not supported) id: run_pipeline if: ${{ job.steps.stub_run_pipeline.status == failure() }} env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images NXF_SINGULARITY_HOME_MOUNT: true run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -profile test,singularity --outdir ./results + + - name: Count the downloaded number of container images + id: count_afterwards + run: | + image_count=$(ls -1 ./singularity_container_images | wc -l | xargs) + echo "Post-pipeline run container image count: $image_count" + echo "IMAGE_COUNT_AFTER=$image_count" >> ${GITHUB_ENV} + + - name: Compare container image counts + run: | + if [ "${{ env.IMAGE_COUNT_INITIAL }}" -ne "${{ env.IMAGE_COUNT_AFTER }}" ]; then + initial_count=${{ env.IMAGE_COUNT_INITIAL }} + final_count=${{ env.IMAGE_COUNT_AFTER }} + difference=$((final_count - initial_count)) + echo "$difference additional container images were \n downloaded at runtime . The pipeline has no support for offline runs!" + tree ./singularity_container_images + exit 1 + else + echo "The pipeline can be downloaded successfully!" + fi diff --git a/.github/workflows/fix-linting.yml b/.github/workflows/fix-linting.yml deleted file mode 100644 index 31ca9786..00000000 --- a/.github/workflows/fix-linting.yml +++ /dev/null @@ -1,89 +0,0 @@ -name: Fix linting from a comment -on: - issue_comment: - types: [created] - -jobs: - fix-linting: - # Only run if comment is on a PR with the main repo, and if it contains the magic keywords - if: > - contains(github.event.comment.html_url, '/pull/') && - contains(github.event.comment.body, '@nf-core-bot fix linting') && - github.repository == 'plant-food-research-open/assemblyqc' - runs-on: ubuntu-latest - steps: - # Use the @nf-core-bot token to check out so we can push later - - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 - with: - token: ${{ secrets.nf_core_bot_auth_token }} - - # indication that the linting is being fixed - - name: React on comment - uses: peter-evans/create-or-update-comment@71345be0265236311c031f5c7866368bd1eff043 # v4 - with: - comment-id: ${{ github.event.comment.id }} - reactions: eyes - - # Action runs on the issue comment, so we don't get the PR by default - # Use the gh cli to check out the PR - - name: Checkout Pull Request - run: gh pr checkout ${{ github.event.issue.number }} - env: - GITHUB_TOKEN: ${{ secrets.nf_core_bot_auth_token }} - - # Install and run pre-commit - - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5 - with: - python-version: "3.12" - - - name: Install pre-commit - run: pip install pre-commit - - - name: Run pre-commit - id: pre-commit - run: pre-commit run --all-files - continue-on-error: true - - # indication that the linting has finished - - name: react if linting finished succesfully - if: steps.pre-commit.outcome == 'success' - uses: peter-evans/create-or-update-comment@71345be0265236311c031f5c7866368bd1eff043 # v4 - with: - comment-id: ${{ github.event.comment.id }} - reactions: "+1" - - - name: Commit & push changes - id: commit-and-push - if: steps.pre-commit.outcome == 'failure' - run: | - git config user.email "core@nf-co.re" - git config user.name "nf-core-bot" - git config push.default upstream - git add . - git status - git commit -m "[automated] Fix code linting" - git push - - - name: react if linting errors were fixed - id: react-if-fixed - if: steps.commit-and-push.outcome == 'success' - uses: peter-evans/create-or-update-comment@71345be0265236311c031f5c7866368bd1eff043 # v4 - with: - comment-id: ${{ github.event.comment.id }} - reactions: hooray - - - name: react if linting errors were not fixed - if: steps.commit-and-push.outcome == 'failure' - uses: peter-evans/create-or-update-comment@71345be0265236311c031f5c7866368bd1eff043 # v4 - with: - comment-id: ${{ github.event.comment.id }} - reactions: confused - - - name: react if linting errors were not fixed - if: steps.commit-and-push.outcome == 'failure' - uses: peter-evans/create-or-update-comment@71345be0265236311c031f5c7866368bd1eff043 # v4 - with: - issue-number: ${{ github.event.issue.number }} - body: | - @${{ github.actor }} I tried to fix the linting errors, but it didn't work. Please fix them manually. - See [CI log](https://github.com/plant-food-research-open/assemblyqc/actions/runs/${{ github.run_id }}) for more details. diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index 1fcafe88..6bfe9373 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -1,6 +1,6 @@ name: nf-core linting # This workflow is triggered on pushes and PRs to the repository. -# It runs the `nf-core lint` and markdown lint tests to ensure +# It runs the `nf-core pipelines lint` and markdown lint tests to ensure # that the code meets the nf-core guidelines. on: push: @@ -41,17 +41,32 @@ jobs: python-version: "3.12" architecture: "x64" + - name: read .nf-core.yml + uses: pietrobolcato/action-read-yaml@1.1.0 + id: read_yml + with: + config: ${{ github.workspace }}/.nf-core.yml + - name: Install dependencies run: | python -m pip install --upgrade pip - pip install nf-core + pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }} + + - name: Run nf-core pipelines lint + if: ${{ github.base_ref != 'main' }} + env: + GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} + run: nf-core -l lint_log.txt pipelines lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md - - name: Run nf-core lint + - name: Run nf-core pipelines lint --release + if: ${{ github.base_ref == 'main' }} env: GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} - run: nf-core -l lint_log.txt lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md + run: nf-core -l lint_log.txt pipelines lint --release --dir ${GITHUB_WORKSPACE} --markdown lint_results.md - name: Save PR number if: ${{ always() }} diff --git a/.github/workflows/linting_comment.yml b/.github/workflows/linting_comment.yml index 40acc23f..42e519bf 100644 --- a/.github/workflows/linting_comment.yml +++ b/.github/workflows/linting_comment.yml @@ -11,7 +11,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Download lint results - uses: dawidd6/action-download-artifact@09f2f74827fd3a8607589e5ad7f9398816f540fe # v3 + uses: dawidd6/action-download-artifact@bf251b5aa9c2f7eeb574a96ee720e24f801b7c11 # v6 with: workflow: linting.yml workflow_conclusion: completed diff --git a/.gitignore b/.gitignore index 837a2cd1..3950461f 100644 --- a/.gitignore +++ b/.gitignore @@ -14,3 +14,4 @@ testing* # nf-test files .nf-test/ .nf-test.log +null/ diff --git a/.gitpod.yml b/.gitpod.yml index 105a1821..46118637 100644 --- a/.gitpod.yml +++ b/.gitpod.yml @@ -4,17 +4,14 @@ tasks: command: | pre-commit install --install-hooks nextflow self-update - - name: unset JAVA_TOOL_OPTIONS - command: | - unset JAVA_TOOL_OPTIONS vscode: extensions: # based on nf-core.nf-core-extensionpack - - esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code + #- esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code - EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files - Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar - mechatroner.rainbow-csv # Highlight columns in csv files in different colors - # - nextflow.nextflow # Nextflow syntax highlighting + - nextflow.nextflow # Nextflow syntax highlighting - oderwat.indent-rainbow # Highlight indentation level - streetsidesoftware.code-spell-checker # Spelling checker for source code - charliermarsh.ruff # Code linter Ruff diff --git a/.nf-core.yml b/.nf-core.yml index 0ade11e9..b62355dd 100644 --- a/.nf-core.yml +++ b/.nf-core.yml @@ -1,5 +1,13 @@ +bump_version: null lint: + actions_ci: false + multiqc_config: false + template_strings: false + included_configs: false files_exist: + - conf/igenomes.config + - conf/igenomes_ignored.config + - assets/multiqc_config.yml - CODE_OF_CONDUCT.md - assets/nf-core-assemblyqc_logo_light.png - docs/images/nf-core-assemblyqc_logo_light.png @@ -7,21 +15,36 @@ lint: - .github/ISSUE_TEMPLATE/config.yml - .github/workflows/awstest.yml - .github/workflows/awsfulltest.yml - - assets/multiqc_config.yml - - conf/igenomes.config files_unchanged: - - docs/README.md - - .github/PULL_REQUEST_TEMPLATE.md + - LICENSE + - .gitignore - .github/CONTRIBUTING.md + - .github/PULL_REQUEST_TEMPLATE.md - .github/workflows/branch.yml - - LICENSE + - .github/workflows/linting.yml + - docs/README.md nextflow_config: - manifest.name - manifest.homePage - multiqc_config: False - template_strings: False -nf_core_version: 2.14.1 + - validation.help.beforeText + - validation.help.afterText + - validation.summary.beforeText + - validation.summary.afterText +nf_core_version: 3.0.2 +org_path: null repository_type: pipeline template: - prefix: plant-food-research-open - skip: [] + author: Usman Rashid, Ken Smith, Ross Crowhurst, Chen Wu, Marcus Davy + description: A NextFlow pipeline which evaluates assembly quality with multiple + QC tools and presents the results in a unified html report. + force: false + is_nfcore: false + name: assemblyqc + org: plant-food-research-open + outdir: . + skip_features: + - igenomes + - multiqc + - fastqc + version: 2.2.0dev +update: null diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 1168e07f..5875f2b0 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -7,7 +7,7 @@ repos: - prettier@3.2.5 - repo: https://github.com/editorconfig-checker/editorconfig-checker.python - rev: "2.7.3" + rev: "3.0.3" hooks: - id: editorconfig-checker alias: ec diff --git a/CHANGELOG.md b/CHANGELOG.md index a0a472db..1c5def16 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,45 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## v2.2.0 - [05-Nov-2024] + +### `Added` + +1. Added Gfastats [#126](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/126) +2. Updated nf-core/template to 3.0.2 [#149](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/149) +3. Updated `samtools faidx` to 1.21 +4. Now using nf-test for pipeline level testing [#153](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/153) +5. Added `text/html` as content mime type for the report file [#146](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/146) +6. Added a sequence labels table below the HiC contact map [#147](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/147) +7. Added parameter `hic_samtools_ext_args` and set its default value to `-F 3852` [#159](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/159) +8. Added the HiC QC report to the final report so that users don't have to navigate to the results folder [#162](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/162) +9. Added the fastp log to the final report [#163](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/163) +10. Updated the tube map along with the tool list [#166](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/166) +11. Added Orthofinder [#167](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/167) +12. Changed order of tool options in the `nextflow.config` file +13. Updated PFR's Kraken 2 database to `k2_pluspfp_20240904` [#170](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/170) +14. Increased memory requirement for Kraken 2 to `256.GB` + +### `Fixed` + +1. Fixed a bug where Gene score distribution graph did not appear correctly [#125](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/125) +2. Increased memory requirement for `DNADIFF` to avoid SLURM OOM kills with exit code 2 [#141](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/141) +3. Documented the use explicit use of `-revision` parameter [#160](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/160) +4. Now using `_JAVA_OPTIONS` in module `RUNASSEMBLYVISUALIZER` to avoid user preferences related errors + +### `Dependencies` + +1. Nextflow!>=24.04.2 +2. nf-schema@2.1.1 + +### `Deprecated` + +1. Reduced the GenomeTools stats figures to 300 DPI [#142](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/142) +2. Now `synteny_mummer_min_bundle_size` is set to `1000000` by default [#142](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/142) +3. `results` is not the default output directory anymore +4. Removed a number of unnecessary parameters: `monochromeLogs`, `config_profile_contact`, `config_profile_url`, `validationFailUnrecognisedParams`, `validationLenientMode`, `validationSchemaIgnoreParams`, `validationShowHiddenParams` `validate_params` +5. Resource parameters have been removed: `max_memory`, `max_cpus`, `max_time` + ## v2.1.1 - [20-Sep-2024] ### `Added` diff --git a/CITATION.cff b/CITATION.cff index eff317d2..18bb4b7d 100644 --- a/CITATION.cff +++ b/CITATION.cff @@ -25,7 +25,7 @@ authors: - family-names: "Deng" given-names: "Cecilia" title: "AssemblyQC: A Nextflow pipeline for reproducible reporting of assembly quality" -version: 2.1.1 +version: 2.2.0 date-released: 2024-07-30 url: "https://github.com/Plant-Food-Research-Open/assemblyqc" doi: 10.1093/bioinformatics/btae477 diff --git a/CITATIONS.md b/CITATIONS.md index fbd8036d..bf095d43 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -18,15 +18,15 @@ > Gremme G, Steinbiss S, Kurtz S. 2013. "GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 3, pp. 645-656, May 2013, doi: -- SAMTOOLS, [MIT/Expat](https://github.com/samtools/samtools/blob/develop/LICENSE) +- samtools, [MIT/Expat](https://github.com/samtools/samtools/blob/develop/LICENSE) > Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. 2021. Twelve years of SAMtools and BCFtools, GigaScience, Volume 10, Issue 2, February 2021, giab008, -- NCBI/FCS, [License](https://github.com/ncbi/fcs/blob/main/LICENSE.txt) +- NCBI FCS, [License](https://github.com/ncbi/fcs/blob/main/LICENSE.txt) > Astashyn A, Tvedte ES, Sweeney D, Sapojnikov V, Bouk N, Joukov V, Mozes E, Strope PK, Sylla PM, Wagner L, Bidwell SL, Clark K, Davis EW, Smith-White B, Hlavina W, Pruitt KD, Schneider VA, Murphy TD. 2023. Rapid and sensitive detection of genome contamination at scale with FCS-GX. bioRxiv 2023.06.02.543519; doi: -- KRONA, [License](https://github.com/marbl/Krona/blob/master/KronaTools/LICENSE.txt) +- Krona, [License](https://github.com/marbl/Krona/blob/master/KronaTools/LICENSE.txt) > Ondov BD, Bergman NH, Phillippy AM. 2011. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30;12:385. doi: @@ -36,19 +36,23 @@ > > Forked from: +- gfastats, [MIT](https://github.com/vgl-hub/gfastats/blob/main/LICENSE) + + > Giulio Formenti, Linelle Abueg, Angelo Brajuka, Nadolina Brajuka, Cristóbal Gallardo-Alba, Alice Giani, Olivier Fedrigo, Erich D Jarvis, Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs, Bioinformatics, Volume 38, Issue 17, September 2022, Pages 4214–4216, + - BUSCO, [MIT](https://gitlab.com/ezlab/busco/-/blob/master/LICENSE) > Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. 2021. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes, Molecular Biology and Evolution, Volume 38, Issue 10, October 2021, Pages 4647–4654, -- GFFREAD, [MIT](https://github.com/gpertea/gffread/blob/master/LICENSE) +- GffRead, [MIT](https://github.com/gpertea/gffread/blob/master/LICENSE) > Pertea G, Pertea M. GFF Utilities: GffRead and GffCompare. F1000Res. 2020 Apr 28;9:ISCB Comm J-304. doi: . PMID: 32489650; PMCID: PMC7222033. -- TIDK, [MIT](https://github.com/tolkit/telomeric-identifier/blob/main/LICENSE) +- tidk, [MIT](https://github.com/tolkit/telomeric-identifier/blob/main/LICENSE) > -- SEQKIT, [MIT](https://github.com/shenwei356/seqkit/blob/master/LICENSE) +- SeqKit, [MIT](https://github.com/shenwei356/seqkit/blob/master/LICENSE) > Shen W, Le S, Li Y, Hu F. 2016. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLoS ONE 11(10): e0163962. @@ -68,19 +72,19 @@ > Shujun O, Ning J 2018. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons, Plant Physiology, 176, 2 (2018). -- KRAKEN2, [MIT](https://github.com/DerrickWood/kraken2/blob/master/LICENSE) +- Kraken 2, [MIT](https://github.com/DerrickWood/kraken2/blob/master/LICENSE) > Wood DE, Salzberg SL, Wood DE, Lu J, Langmead B. 2019. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019). -- JUICEBOX.JS, [MIT](https://github.com/igvteam/juicebox.js/blob/master/LICENSE) +- juicebox.js, [MIT](https://github.com/igvteam/juicebox.js/blob/master/LICENSE) > Robinson JT, Turner D, Durand NC, Thorvaldsdóttir H, Mesirov JP, Aiden EL. 2018. Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data. Cell Syst. 2018 Feb 28;6(2):256-258.e1. doi: . Epub 2018 Feb 7. PMID: 29428417; PMCID: PMC6047755. -- FASTP, [MIT](https://github.com/OpenGene/fastp/blob/master/LICENSE) +- fastp, [MIT](https://github.com/OpenGene/fastp/blob/master/LICENSE) > Chen S, Zhou Y, Chen Y, Gu J. 2018. fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, 01 September 2018, Pages i884–i890, -- FASTQC, [GPL v3](https://github.com/s-andrews/FastQC/blob/master/LICENSE.txt) +- FastQC, [GPL v3](https://github.com/s-andrews/FastQC/blob/master/LICENSE.txt) > @@ -88,7 +92,7 @@ > Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, Shamim MS, Machol I, Lander, Aiden AP, Aiden EL 2017. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds.Science356, 92-95(2017). doi: . Available at: -- HIC_QC, [AGPL v3](https://github.com/phasegenomics/hic_qc/blob/master/LICENSE) +- hic_qc, [AGPL v3](https://github.com/phasegenomics/hic_qc/blob/master/LICENSE) > @@ -96,42 +100,46 @@ > -- BWA, [GPL v3](https://github.com/lh3/bwa/blob/master/COPYING) +- bwa-mem, [GPL v3](https://github.com/lh3/bwa/blob/master/COPYING) > Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. -- MATLOCK, [AGPL v3](https://github.com/phasegenomics/matlock/blob/master/LICENSE) +- Matlock, [AGPL v3](https://github.com/phasegenomics/matlock/blob/master/LICENSE) > ; -- SAMBLASTER, [MIT](https://github.com/GregoryFaust/samblaster/blob/master/LICENSE.txt) +- samblaster, [MIT](https://github.com/GregoryFaust/samblaster/blob/master/LICENSE.txt) > Faust GG, Hall IM. 2014. SAMBLASTER: fast duplicate marking and structural variant read extraction, Bioinformatics, Volume 30, Issue 17, September 2014, Pages 2503–2505, -- CIRCOS, [GPL v3](https://www.gnu.org/licenses/gpl-3.0.txt) +- Circos, [GPL v3](https://www.gnu.org/licenses/gpl-3.0.txt) > Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R. Horsman D, ... Marra MA. 2009. Circos: an information aesthetic for comparative genomics. Genome research, 19(9), 1639-1645. -- MUMMER, [Artistic 2.0](https://github.com/mummer4/mummer/blob/master/LICENSE.md) +- MUMmer, [Artistic 2.0](https://github.com/mummer4/mummer/blob/master/LICENSE.md) > Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. 2018. MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol. 2018 Jan 26;14(1):e1005944. doi: . PMID: 29373581; PMCID: PMC5802927. -- PLOTSR, [MIT](https://github.com/schneebergerlab/plotsr/blob/master/LICENSE) +- Plotsr, [MIT](https://github.com/schneebergerlab/plotsr/blob/master/LICENSE) > Goel M, Schneeberger K. 2022. plotsr: visualizing structural similarities and rearrangements between multiple genomes. Bioinformatics. 2022 May 13;38(10):2922-2926. doi: . PMID: 35561173; PMCID: PMC9113368. -- SYRI, [MIT](https://github.com/schneebergerlab/syri/blob/master/LICENSE) +- Syri, [MIT](https://github.com/schneebergerlab/syri/blob/master/LICENSE) > Goel M, Sun H, Jiao WB, Schneeberger K. 2019. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 2019 Dec 16;20(1):277. doi: . PMID: 31842948; PMCID: PMC6913012. -- MINIMAP2, [MIT](https://github.com/lh3/minimap2/blob/master/LICENSE.txt) +- Minimap2, [MIT](https://github.com/lh3/minimap2/blob/master/LICENSE.txt) > Li H. 2021. New strategies to improve minimap2 alignment accuracy, Bioinformatics, Volume 37, Issue 23, December 2021, Pages 4572–4574, doi: -- MERQURY, [United States Government Work](https://github.com/marbl/merqury?tab=License-1-ov-file#readme) +- Merqury, [United States Government Work](https://github.com/marbl/merqury?tab=License-1-ov-file#readme) > Rhie, A., Walenz, B.P., Koren, S. et al. 2020. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21, 245. doi: +- OrthoFinder, [GPL v3](https://github.com/davidemms/OrthoFinder/blob/master/License.md) + + > Emms, D.M., Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20, 238 (2019). doi: 10.1186/s13059-019-1832-y + ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) diff --git a/README.md b/README.md index de13e09d..aa63b68d 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,10 @@ +# plant-food-research-open/assemblyqc + [![GitHub Actions CI Status](https://github.com/plant-food-research-open/assemblyqc/actions/workflows/ci.yml/badge.svg)](https://github.com/plant-food-research-open/assemblyqc/actions/workflows/ci.yml) [![GitHub Actions Linting Status](https://github.com/plant-food-research-open/assemblyqc/actions/workflows/linting.yml/badge.svg)](https://github.com/plant-food-research-open/assemblyqc/actions/workflows/linting.yml)[![Cite Article](http://img.shields.io/badge/DOI-10.1093/bioinformatics/btae477-1073c8?labelColor=000000)](https://doi.org/10.1093/bioinformatics/btae477) [![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com) -[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.04.0-23aa62.svg)](https://www.nextflow.io/) +[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.2-23aa62.svg)](https://www.nextflow.io/) [![run with conda ❌](http://img.shields.io/badge/run%20with-conda%20❌-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/) [![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/) [![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/) @@ -10,79 +12,43 @@ ## Introduction -**plant-food-research-open/assemblyqc** is a [Nextflow](https://www.nextflow.io/docs/latest/index.html) pipeline which evaluates assembly quality with multiple QC tools and presents the results in a unified html report. The tools are shown in the [Pipeline Flowchart](#pipeline-flowchart) and their references are listed in [CITATIONS.md](./CITATIONS.md). +**plant-food-research-open/assemblyqc** is a [Nextflow](https://www.nextflow.io/docs/latest/index.html) pipeline which evaluates assembly quality with multiple QC tools and presents the results in a unified html report. The tools are shown in the [Pipeline Flowchart](#pipeline-flowchart) and their references are listed in [CITATIONS.md](./CITATIONS.md). The pipeline includes skip flags to disable execution of various tools. ## Pipeline Flowchart -```mermaid -%%{init: { - 'theme': 'base', - 'themeVariables': { - 'fontSize': '52px", - 'primaryColor': '#9A6421', - 'primaryTextColor': '#ffffff', - 'primaryBorderColor': '#9A6421', - 'lineColor': '#B180A8', - 'secondaryColor': '#455C58', - 'tertiaryColor': '#ffffff' - } -}}%% -flowchart LR - forEachTag(Assembly) ==> VALIDATE_FORMAT[VALIDATE FORMAT] - - VALIDATE_FORMAT ==> ncbiFCS[NCBI FCS ADAPTOR] - ncbiFCS ==> Check{Check} - - VALIDATE_FORMAT ==> ncbiGX[NCBI FCS GX] - ncbiGX ==> Check - Check ==> |Clean|Run(Run) - - Check ==> |Contamination|Skip(Skip All) - Skip ==> REPORT - - VALIDATE_FORMAT ==> GFF_STATS[GENOMETOOLS GT STAT] - - Run ==> ASS_STATS[ASSEMBLATHON STATS] - Run ==> BUSCO - Run ==> TIDK - Run ==> LAI - Run ==> KRAKEN2 - Run ==> HIC_CONTACT_MAP[HIC CONTACT MAP] - Run ==> MUMMER - Run ==> MINIMAP2 - Run ==> MERQURY - - MUMMER ==> CIRCOS - MUMMER ==> DOTPLOT - - MINIMAP2 ==> PLOTSR - - ASS_STATS ==> REPORT - GFF_STATS ==> REPORT - BUSCO ==> REPORT - TIDK ==> REPORT - LAI ==> REPORT - KRAKEN2 ==> REPORT - HIC_CONTACT_MAP ==> REPORT - CIRCOS ==> REPORT - DOTPLOT ==> REPORT - PLOTSR ==> REPORT - MERQURY ==> REPORT -``` - -- [FASTA VALIDATOR](https://github.com/linsalrob/fasta_validator) + [SEQKIT RMDUP](https://github.com/shenwei356/seqkit): FASTA validation -- [GENOMETOOLS GT GFF3VALIDATOR](https://genometools.org/tools/gt_gff3validator.html): GFF3 validation -- [ASSEMBLATHON STATS](https://github.com/PlantandFoodResearch/assemblathon2-analysis/blob/a93cba25d847434f7eadc04e63b58c567c46a56d/assemblathon_stats.pl): Assembly statistics -- [GENOMETOOLS GT STAT](https://genometools.org/tools/gt_stat.html): Annotation statistics -- [NCBI FCS ADAPTOR](https://github.com/ncbi/fcs): Adaptor contamination pass/fail -- [NCBI FCS GX](https://github.com/ncbi/fcs): Foreign organism contamination pass/fail -- [BUSCO](https://gitlab.com/ezlab/busco): Gene-space completeness estimation -- [TIDK](https://github.com/tolkit/telomeric-identifier): Telomere repeat identification -- [LAI](https://github.com/oushujun/LTR_retriever/blob/master/LAI): Continuity of repetitive sequences -- [KRAKEN2](https://github.com/DerrickWood/kraken2): Taxonomy classification -- [HIC CONTACT MAP](https://github.com/igvteam/juicebox.js): Alignment and visualisation of HiC data -- [MUMMER](https://github.com/mummer4/mummer) → [CIRCOS](http://circos.ca/documentation/) + [DOTPLOT](https://plotly.com) & [MINIMAP2](https://github.com/lh3/minimap2) → [PLOTSR](https://github.com/schneebergerlab/plotsr): Synteny analysis -- [MERQURY](https://github.com/marbl/merqury): K-mer completeness, consensus quality and phasing assessment +

+ +- `Assembly` + - [fasta_validator](https://github.com/linsalrob/fasta_validator) + [SeqKit rmdup](https://github.com/shenwei356/seqkit): FASTA validation + - [assemblathon_stats](https://github.com/PlantandFoodResearch/assemblathon2-analysis/blob/a93cba25d847434f7eadc04e63b58c567c46a56d/assemblathon_stats.pl), [gfastats](https://github.com/vgl-hub/gfastats): Assembly statistics + - [NCBI FCS-adaptor](https://github.com/ncbi/fcs): Adaptor contamination pass/fail + - [NCBI FCS-GX](https://github.com/ncbi/fcs): Foreign organism contamination pass/fail + - [tidk](https://github.com/tolkit/telomeric-identifier): Telomere repeat identification + - [BUSCO](https://gitlab.com/ezlab/busco): Gene-space completeness estimation + - [LAI](https://github.com/oushujun/LTR_retriever/blob/master/LAI): Continuity of repetitive sequences + - [Kraken 2](https://github.com/DerrickWood/kraken2), [Krona](https://github.com/marbl/Krona): Taxonomy classification + - `Alignment and visualisation of HiC data` + - [sra-tools](https://github.com/ncbi/sra-tools): HiC data download from SRA or use of local FASTQ files + - [fastp](https://github.com/OpenGene/fastp), [FastQC](https://github.com/s-andrews/FastQC): Read QC and trimming + - [SeqKit sort](https://github.com/shenwei356/seqkit): Alphanumeric sorting of FASTA by sequence ID + - [bwa-mem](https://github.com/lh3/bwa): HiC read alignment + - [samblaster](https://github.com/GregoryFaust/samblaster): Duplicate marking + - [hic_qc](https://github.com/phasegenomics/hic_qc): HiC read and alignment statistics + - [Matlock](https://github.com/phasegenomics/matlock): BAM to juicer conversion + - [3d-dna/visualize](https://github.com/aidenlab/3d-dna/tree/master/visualize): `.hic` file creation + - [juicebox.js](https://github.com/igvteam/juicebox.js): HiC contact map visualisation + - `K-mer completeness, consensus quality and phasing assessment` + - [sra-tools](https://github.com/ncbi/sra-tools): Assembly, maternal and paternal data download from SRA or use of local FASTQ files + - [Merqury hapmers](https://github.com/marbl/merqury/blob/master/trio/hapmers.sh): Hapmer generation if parental data is available + - [Merqury](https://github.com/marbl/merqury): Completeness, consensus quality and phasing assessment + - `Synteny analysis` + - [MUMmer](https://github.com/mummer4/mummer) → [Circos](http://circos.ca/documentation/) + [dotplot](https://plotly.com): One-to-all and all-to-all synteny analysis at the contig level + - [Minimap2](https://github.com/lh3/minimap2) → [Syri](https://github.com/schneebergerlab/syri)/[Plotsr](https://github.com/schneebergerlab/plotsr): One-to-one synteny analysis at the chromosome level +- `Annotation` + - [GenomeTools gt gff3validator](https://genometools.org/tools/gt_gff3validator.html) + [FASTA/GFF correspondence](subworkflows/gallvp/gff3_gt_gff3_gff3validator_stat/main.nf): GFF3 validation + - [GenomeTools gt stat](https://genometools.org/tools/gt_stat.html): Annotation statistics + - [GffRead](https://github.com/gpertea/gffread), [BUSCO](https://gitlab.com/ezlab/busco): Gene-space completeness estimation in annotation proteins + - [OrthoFinder](https://github.com/davidemms/OrthoFinder): Phylogenetic orthology inference for comparative genomics ## Usage @@ -100,14 +66,14 @@ Now, you can run the pipeline using: ```bash nextflow run plant-food-research-open/assemblyqc \ - -profile \ - --input assemblysheet.csv \ - --outdir + -revision \ + -profile \ + --input assemblysheet.csv \ + --outdir ``` > [!WARNING] -> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; -> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files). +> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files). ### Plant&Food Users @@ -138,31 +104,32 @@ The pipeline uses nf-core modules contributed by following authors: - - + + + + - + + - - + + + - - - @@ -182,7 +149,7 @@ If you use plant-food-research-open/assemblyqc for your analysis, please cite it An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file. -This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE). +This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/main/LICENSE). > **The nf-core framework for community-curated bioinformatics pipelines.** > diff --git a/assets/schema_input.json b/assets/schema_input.json index afec8fe7..83f4915d 100644 --- a/assets/schema_input.json +++ b/assets/schema_input.json @@ -1,11 +1,12 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/plant-food-research-open/assemblyqc/main/assets/schema_input.json", "title": "plant-food-research-open/assemblyqc pipeline - params.input schema", "description": "Schema for the file provided with params.input", "type": "array", "items": { "type": "object", + "uniqueEntries": ["tag"], "properties": { "tag": { "type": "string", @@ -72,8 +73,7 @@ "type": "string", "maxLength": 0 } - ], - "dependentRequired": ["reads_1"] + ] }, "maternal_reads_1": { "errorMessage": "maternal_reads_1 should be a SRA ID for paired FASTQ files or FASTX file path without spaces and must have extension '.f(a|asta|as|sa|na|astq|q)' or '.f(a|asta|as|sa|na|astq|q).gz'", @@ -86,8 +86,7 @@ "type": "string", "maxLength": 0 } - ], - "dependentRequired": ["reads_1", "paternal_reads_1"] + ] }, "maternal_reads_2": { "errorMessage": "FASTX file path cannot contain spaces and must have extension '.f(a|asta|as|sa|na|astq|q)' or '.f(a|asta|as|sa|na|astq|q).gz'", @@ -100,8 +99,7 @@ "type": "string", "maxLength": 0 } - ], - "dependentRequired": ["maternal_reads_1"] + ] }, "paternal_reads_1": { "errorMessage": "paternal_reads_1 should be a SRA ID for paired FASTQ files or FASTX file path without spaces and must have extension '.f(a|asta|as|sa|na|astq|q)' or '.f(a|asta|as|sa|na|astq|q).gz'", @@ -114,8 +112,7 @@ "type": "string", "maxLength": 0 } - ], - "dependentRequired": ["reads_1", "maternal_reads_1"] + ] }, "paternal_reads_2": { "errorMessage": "FASTX file path cannot contain spaces and must have extension '.f(a|asta|as|sa|na|astq|q)' or '.f(a|asta|as|sa|na|astq|q).gz'", @@ -128,10 +125,16 @@ "type": "string", "maxLength": 0 } - ], - "dependentRequired": ["paternal_reads_1"] + ] } }, - "required": ["tag", "fasta"] + "required": ["tag", "fasta"], + "dependentRequired": { + "reads_2": ["reads_1"], + "maternal_reads_1": ["reads_1", "paternal_reads_1"], + "maternal_reads_2": ["maternal_reads_1"], + "paternal_reads_1": ["reads_1", "maternal_reads_1"], + "paternal_reads_2": ["paternal_reads_1"] + } } } diff --git a/assets/schema_xref_assemblies.json b/assets/schema_xref_assemblies.json index 0da6ca1c..9a82cfbf 100644 --- a/assets/schema_xref_assemblies.json +++ b/assets/schema_xref_assemblies.json @@ -1,11 +1,12 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/plant-food-research-open/assemblyqc/main/assets/schema_xref_assemblies.json", "title": "plant-food-research-open/assemblyqc pipeline - params.synteny_xref_assemblies schema", "description": "Schema for the file provided with params.synteny_xref_assemblies", "type": "array", "items": { "type": "object", + "uniqueEntries": ["tag"], "properties": { "tag": { "type": "string", diff --git a/bin/assemblyqc.py b/bin/assemblyqc.py index b5c73f56..c808e4de 100755 --- a/bin/assemblyqc.py +++ b/bin/assemblyqc.py @@ -16,6 +16,9 @@ from report_modules.parsers.assemblathon_stats_parser import ( parse_assemblathon_stats_folder, ) +from report_modules.parsers.gfastats_parser import ( + parse_gfastats_folder, +) from report_modules.parsers.genometools_gt_stat_parser import ( parse_genometools_gt_stat_folder, ) @@ -26,6 +29,7 @@ from report_modules.parsers.hic_parser import parse_hic_folder from report_modules.parsers.synteny_parser import parse_synteny_folder from report_modules.parsers.merqury_parser import parse_merqury_folder +from report_modules.parsers.orthofinder_parser import parse_orthofinder_folder if __name__ == "__main__": params_dict, params_table = parse_params_json("params_json.json") @@ -41,6 +45,7 @@ data_from_tools = {**data_from_tools, **parse_ncbi_fcs_adaptor_folder()} data_from_tools = {**data_from_tools, **parse_ncbi_fcs_gx_folder()} data_from_tools = {**data_from_tools, **parse_assemblathon_stats_folder()} + data_from_tools = {**data_from_tools, **parse_gfastats_folder()} data_from_tools = {**data_from_tools, **parse_genometools_gt_stat_folder()} data_from_tools = {**data_from_tools, **parse_busco_folder()} data_from_tools = { @@ -53,6 +58,7 @@ data_from_tools = {**data_from_tools, **parse_hic_folder()} data_from_tools = {**data_from_tools, **parse_synteny_folder()} data_from_tools = {**data_from_tools, **parse_merqury_folder()} + data_from_tools = {**data_from_tools, **parse_orthofinder_folder()} with open("software_versions.yml", "r") as f: versions_from_ch_versions = yaml.safe_load(f) diff --git a/bin/report_modules/parsers/genometools_gt_stat_parser.py b/bin/report_modules/parsers/genometools_gt_stat_parser.py index ddc3bb95..9881f576 100644 --- a/bin/report_modules/parsers/genometools_gt_stat_parser.py +++ b/bin/report_modules/parsers/genometools_gt_stat_parser.py @@ -22,7 +22,6 @@ def parse_genometools_gt_stat_folder(folder_name="genometools_gt_stat"): data = {"GENOMETOOLS_GT_STAT": []} for report_path in list_of_report_files: - NUM_GROUPS = -1 ( report_table_dict, @@ -35,7 +34,7 @@ def parse_genometools_gt_stat_folder(folder_name="genometools_gt_stat"): ) = extract_report_data(report_path, NUM_GROUPS) gene_length_distribution_graph = "" - if gene_length_distribution != []: + if len(gene_length_distribution) > 1: gene_length_distribution_graph = create_dist_graph( gene_length_distribution, "Length", @@ -44,7 +43,7 @@ def parse_genometools_gt_stat_folder(folder_name="genometools_gt_stat"): ) gene_score_distribution_graph = "" - if gene_score_distribution != []: + if len(gene_score_distribution) > 1: gene_score_distribution_graph = create_dist_graph( gene_score_distribution, "Score", @@ -53,7 +52,7 @@ def parse_genometools_gt_stat_folder(folder_name="genometools_gt_stat"): ) exon_length_distribution_graph = "" - if exon_length_distribution != []: + if len(exon_length_distribution) > 1: exon_length_distribution_graph = create_dist_graph( exon_length_distribution, "Length", @@ -62,7 +61,7 @@ def parse_genometools_gt_stat_folder(folder_name="genometools_gt_stat"): ) exon_number_distribution_graph = "" - if exon_number_distribution != []: + if len(exon_number_distribution) > 1: exon_number_distribution_graph = create_dist_graph( exon_number_distribution, "Number", @@ -71,7 +70,7 @@ def parse_genometools_gt_stat_folder(folder_name="genometools_gt_stat"): ) intron_length_distribution_graph = "" - if intron_length_distribution != []: + if len(intron_length_distribution) > 1: intron_length_distribution_graph = create_dist_graph( intron_length_distribution, "Length", @@ -80,7 +79,7 @@ def parse_genometools_gt_stat_folder(folder_name="genometools_gt_stat"): ) cds_length_distribution_graph = "" - if cds_length_distribution != []: + if len(cds_length_distribution) > 1: cds_length_distribution_graph = create_dist_graph( cds_length_distribution, "Length", @@ -196,7 +195,6 @@ def extract_report_data(report_path, num_groups): def create_frequency_groups(data, num_groups): - if num_groups == -1: sorted_data = sorted(data, key=lambda x: x[0]) return [ @@ -282,7 +280,6 @@ def test_create_frequency_groups_repeat(): def create_dist_graph(groups_dict, x_label, title, file_name): - x_list = [i["stop"] for i in groups_dict] y_list = [i["freq"] for i in groups_dict] sum_y = float(sum(y_list)) @@ -299,43 +296,48 @@ def create_dist_graph(groups_dict, x_label, title, file_name): plt.gca().spines["top"].set_visible(False) plt.gca().spines["right"].set_visible(False) - min_x, min_y = (min(x_list), min(y_list)) - x_anno_step = int(float(max(x_list)) * 0.1) - ax.annotate( - f"(<={min_x}, {round(min_y, 2)}%)", - xy=(min_x, min_y), - xytext=(min_x + x_anno_step, min_y + 10), - arrowprops=dict(color="red", arrowstyle="->, head_width=.15"), - ) + if len(y_list) >= 10: + max_x = max(x_list) + min_x, min_y = (min(x_list), min(y_list)) + x_anno_step = int(float(max(x_list)) * 0.1) + ax.annotate( + f"(<={min_x}, {round(min_y, 2)}%)", + xy=(min_x, min_y), + xytext=(min_x + x_anno_step, min_y + 10), + arrowprops=dict(color="red", arrowstyle="->, head_width=.15"), + ) - near_50 = min([y for y in y_list if y >= 50.0]) - min_x, min_y = (x_list[y_list.index(near_50)], near_50) - ax.annotate( - f"(<={min_x}, {round(min_y, 2)}%)", - xy=(min_x, min_y), - xytext=(min_x + x_anno_step, min_y), - arrowprops=dict(color="red", arrowstyle="->, head_width=.15"), - ) + near_50 = min([y for y in y_list if y >= 50.0]) + min_x, min_y = (x_list[y_list.index(near_50)], near_50) + ax.annotate( + f"(<={min_x}, {round(min_y, 2)}%)", + xy=(min_x, min_y), + xytext=(min_x + x_anno_step, min_y), + arrowprops=dict(color="red", arrowstyle="->, head_width=.15"), + ) - near_90 = min([y for y in y_list if y >= 90.0]) - min_x, min_y = (x_list[y_list.index(near_90)], near_90) - ax.annotate( - f"(<={min_x}, {round(min_y, 2)}%)", - xy=(min_x, min_y), - xytext=(min_x + x_anno_step, min_y - 10), - arrowprops=dict(color="red", arrowstyle="->, head_width=.15"), - ) + near_90 = min([y for y in y_list if y >= 90.0]) + min_x, min_y = (x_list[y_list.index(near_90)], near_90) + ax.annotate( + f"(<={min_x}, {round(min_y, 2)}%)", + xy=(min_x, min_y), + xytext=(min_x + x_anno_step, min_y - 10), + arrowprops=dict(color="red", arrowstyle="->, head_width=.15"), + ) - near_3_sigma = min([y for y in y_list if y >= 99.7]) - min_x, min_y = (x_list[y_list.index(near_3_sigma)], near_3_sigma) - ax.annotate( - f"(<={min_x}, {round(min_y, 2)}%)", - xy=(min_x, min_y), - xytext=(min_x + x_anno_step, min_y - 10), - arrowprops=dict(color="red", arrowstyle="->, head_width=.15"), - ) + near_3_sigma = min([y for y in y_list if y >= 99.7]) + min_x, min_y = (x_list[y_list.index(near_3_sigma)], near_3_sigma) + x_anno_step_updated = ( + x_anno_step if ((min_x + 2 * x_anno_step) < max_x) else (-2 * x_anno_step) + ) + ax.annotate( + f"(<={min_x}, {round(min_y, 2)}%)", + xy=(min_x, min_y), + xytext=(min_x + x_anno_step_updated, min_y - 10), + arrowprops=dict(color="red", arrowstyle="->, head_width=.15"), + ) - plt.savefig(file_name, dpi=600) + plt.savefig(file_name, dpi=300) with open(file_name, "rb") as f: binary_fc = f.read() diff --git a/bin/report_modules/parsers/gfastats_parser.py b/bin/report_modules/parsers/gfastats_parser.py new file mode 100644 index 00000000..6cfe2104 --- /dev/null +++ b/bin/report_modules/parsers/gfastats_parser.py @@ -0,0 +1,46 @@ +import os +from pathlib import Path +import pandas as pd +from tabulate import tabulate +import re + +from report_modules.parsers.parsing_commons import sort_list_of_results + + +def parse_gfastats_folder(folder_name="gfastats"): + dir = os.getcwdb().decode() + reports_folder_path = Path(f"{dir}/{folder_name}") + + if not os.path.exists(reports_folder_path): + return {} + + list_of_report_files = reports_folder_path.glob("*.assembly_summary") + + data = {"GFASTATS": []} + + for report_path in list_of_report_files: + report_table = pd.read_csv(report_path, sep="\t") + report_table.columns = ['Stat', 'Value'] + + file_tokens = re.findall( + r"([\w]+).assembly_summary", + os.path.basename(str(report_path)), + )[0] + + data["GFASTATS"].append( + { + "hap": file_tokens, + "report_table": report_table.to_dict("records"), + "report_table_html": tabulate( + report_table, + headers=["Stat", "Value"], + tablefmt="html", + numalign="left", + showindex=False, + ), + } + ) + + return { + "GFASTATS": sort_list_of_results(data["GFASTATS"], "hap") + } diff --git a/bin/report_modules/parsers/hic_parser.py b/bin/report_modules/parsers/hic_parser.py index 825852ec..45d52fbf 100644 --- a/bin/report_modules/parsers/hic_parser.py +++ b/bin/report_modules/parsers/hic_parser.py @@ -1,10 +1,56 @@ import os from pathlib import Path +import pandas as pd +from tabulate import tabulate import re from report_modules.parsers.parsing_commons import sort_list_of_results +def colorize_fastp_log(log: Path): + section_colors = { + "adapter": "color: blue;", + "before_filtering": "color: goldenrod;", + "after_filtering": "color: green;", + "filtering_result": "color: green;", + "duplication": "color: red;", + "fastp": "color: gray;", + "version": "color: blue;", + } + + patterns = { + "adapter": re.compile(r"Detecting adapter sequence for read\d..."), + "before_filtering": re.compile(r"Read\d before filtering:"), + "after_filtering": re.compile(r"Read\d after filtering:"), + "filtering_result": re.compile(r"Filtering result:"), + "duplication": re.compile(r"Duplication rate:"), + "fastp": re.compile(r"fastp --in"), + "version": re.compile(r"fastp v"), + } + + html_log = "
\n"
+
+    for line in log.read_text().split("\n"):
+        colored_line = line.strip()
+        # Apply HTML color style based on section patterns
+        for section, pattern in patterns.items():
+            if pattern.search(line):
+                colored_line = (
+                    f"{line.strip()}"
+                )
+                break
+        else:
+            # Default styling for uncolored lines
+            colored_line = f"{line.strip()}"
+
+        html_log += f"{colored_line}\n"
+
+    # Close HTML tags
+    html_log += "
" + + return html_log + + def parse_hic_folder(folder_name="hic_outputs"): dir = os.getcwdb().decode() hic_folder_path = Path(f"{dir}/{folder_name}") @@ -13,21 +59,58 @@ def parse_hic_folder(folder_name="hic_outputs"): return {} list_of_hic_files = hic_folder_path.glob("*.html") + list_of_hic_files = [ + x for x in list_of_hic_files if re.match(r"^\w+\.html$", x.name) + ] data = {"HIC": []} for hic_path in list_of_hic_files: hic_file_name = os.path.basename(str(hic_path)) - file_tokens = re.findall( + tag = re.findall( r"([\w]+).html", hic_file_name, )[0] + # Get the labels table + labels_table = pd.read_csv(f"{folder_name}/{tag}.agp.assembly", sep=" ") + labels_table = labels_table[labels_table.iloc[:, 0].str.startswith(">")].iloc[ + :, [0, 2] + ] + labels_table.columns = ["Sequence", "Length"] + labels_table.Length = labels_table.Length.astype(int) + + # Get the HiC QC report + hicqc_report = [ + x + for x in hic_folder_path.glob("*.pdf") + if re.match(rf"[\S]+\.on\.{tag}_qc_report\.pdf", x.name) + ][0] + + # Get FASTP log if it is there + fastp_log = [x for x in hic_folder_path.glob("*.log")] + + if fastp_log != []: + fastp_log = fastp_log[0] + fastp_log = colorize_fastp_log(fastp_log) + else: + fastp_log = None + data["HIC"].append( { - "hap": file_tokens, + "hap": tag, "hic_html_file_name": hic_file_name, + "labels_table": labels_table.to_dict("records"), + "labels_table_html": tabulate( + labels_table, + headers=["Sequence", "Length"], + tablefmt="html", + numalign="left", + showindex=False, + ), + "hicqc_report_pdf": os.path.basename(str(hicqc_report)), + "fastp_log": fastp_log, } ) diff --git a/bin/report_modules/parsers/orthofinder_parser.py b/bin/report_modules/parsers/orthofinder_parser.py new file mode 100644 index 00000000..d31fdfe8 --- /dev/null +++ b/bin/report_modules/parsers/orthofinder_parser.py @@ -0,0 +1,100 @@ +import pandas as pd +import base64 +import os +import re + +import matplotlib.pyplot as plt +from tabulate import tabulate +from pathlib import Path +from io import StringIO +from Bio import Phylo + + +def parse_orthofinder_folder(folder_name="orthofinder_outputs/assemblyqc"): + dir = os.getcwdb().decode() + results_root_path = Path(f"{dir}/{folder_name}") + + if not results_root_path.exists(): + return {} + + data = {"ORTHOFINDER": {}} + + # Species tree + tree = Phylo.read( + f"{results_root_path}/Species_Tree/SpeciesTree_rooted.txt", "newick" + ) + + fig = plt.figure(figsize=(6, 6)) + ax = fig.add_subplot(1, 1, 1) + Phylo.draw(tree, do_show=False, axes=ax) + + plt.gca().spines["top"].set_visible(False) + plt.gca().spines["right"].set_visible(False) + + plt.savefig("speciestree_rooted.png", format="png", dpi=300) + + with open("speciestree_rooted.png", "rb") as f: + binary_fc = f.read() + + base64_utf8_str = base64.b64encode(binary_fc).decode("utf-8") + data["ORTHOFINDER"]["speciestree_rooted"] = ( + f"data:image/png+xml;base64,{base64_utf8_str}" + ) + + # Overall statistics + overall_statistics = Path( + f"{results_root_path}/Comparative_Genomics_Statistics/Statistics_Overall.tsv" + ).read_text() + + ## General stats + general_stats = re.findall( + r"(Number of species.*)Orthogroups file", overall_statistics, flags=re.DOTALL + )[0] + general_stats_pd = pd.read_csv(StringIO(general_stats), sep="\t") + + data["ORTHOFINDER"]["general_stats"] = general_stats_pd.to_dict("records") + data["ORTHOFINDER"]["general_stats_html"] = tabulate( + general_stats_pd, + headers=["Stat", "Value"], + tablefmt="html", + numalign="left", + showindex=False, + ) + + ## Genes per-species + genes_per_species = re.findall( + r"(Average number of genes per-species in orthogroup.*)Number of species in orthogroup", + overall_statistics, + flags=re.DOTALL, + )[0] + genes_per_species_pd = pd.read_csv(StringIO(genes_per_species), sep="\t", header=0) + data["ORTHOFINDER"]["genes_per_species"] = genes_per_species_pd.to_dict("records") + data["ORTHOFINDER"]["genes_per_species_html"] = tabulate( + genes_per_species_pd, + headers=genes_per_species_pd.columns.to_list(), + tablefmt="html", + numalign="left", + showindex=False, + ) + + ## Number of species in orthogroup + num_species_orthogroup = re.findall( + r"(Number of species in orthogroup.*)", + overall_statistics, + flags=re.DOTALL, + )[0] + num_species_orthogroup_pd = pd.read_csv( + StringIO(num_species_orthogroup), sep="\t", header=0 + ) + data["ORTHOFINDER"]["num_species_orthogroup"] = num_species_orthogroup_pd.to_dict( + "records" + ) + data["ORTHOFINDER"]["num_species_orthogroup_html"] = tabulate( + num_species_orthogroup_pd, + headers=num_species_orthogroup_pd.columns.to_list(), + tablefmt="html", + numalign="left", + showindex=False, + ) + + return data diff --git a/bin/report_modules/templates/base.html b/bin/report_modules/templates/base.html index 11fe7d92..4da53369 100644 --- a/bin/report_modules/templates/base.html +++ b/bin/report_modules/templates/base.html @@ -32,6 +32,10 @@ {% endif %} + {% if 'GFASTATS' in all_stats_dicts %} + + {% endif %} + {% if 'GENOMETOOLS_GT_STAT' in all_stats_dicts %} {% endif %} @@ -75,6 +79,11 @@ {% if 'MERQURY' in all_stats_dicts %} {% endif %} + + {% if 'ORTHOFINDER' in all_stats_dicts %} + + {% endif %} + {% include 'params/params.html' %} @@ -100,6 +109,10 @@ {% include 'assemblathon_stats/assemblathon_stats.html' %} {% endif %} + {% if 'GFASTATS' in all_stats_dicts %} + {% include 'gfastats/gfastats.html' %} + {% endif %} + {% if 'GENOMETOOLS_GT_STAT' in all_stats_dicts %} {% include 'genometools_gt_stat/genometools_gt_stat.html' %} {% endif %} @@ -143,6 +156,11 @@ {% if 'MERQURY' in all_stats_dicts %} {% include 'merqury/merqury.html' %} {% endif %} + + {% if 'ORTHOFINDER' in all_stats_dicts %} + {% include 'orthofinder/orthofinder.html' %} + {% endif %} + {% include 'js.html' %} diff --git a/bin/report_modules/templates/gfastats/dropdown.html b/bin/report_modules/templates/gfastats/dropdown.html new file mode 100644 index 00000000..e2e76f80 --- /dev/null +++ b/bin/report_modules/templates/gfastats/dropdown.html @@ -0,0 +1,10 @@ + diff --git a/bin/report_modules/templates/gfastats/gfastats.html b/bin/report_modules/templates/gfastats/gfastats.html new file mode 100644 index 00000000..eb94d1c4 --- /dev/null +++ b/bin/report_modules/templates/gfastats/gfastats.html @@ -0,0 +1,16 @@ + diff --git a/bin/report_modules/templates/gfastats/report_contents.html b/bin/report_modules/templates/gfastats/report_contents.html new file mode 100644 index 00000000..e1ca2e9a --- /dev/null +++ b/bin/report_modules/templates/gfastats/report_contents.html @@ -0,0 +1,17 @@ +{% set vars = {'is_first': True} %} {% for item in range(all_stats_dicts["GFASTATS"]|length) %} {% set +active_text = 'display: block' if vars.is_first else 'display: none' %} +
+
+
+
{{ all_stats_dicts['GFASTATS'][item]['hap'] }}
+
+
+
+
{{ all_stats_dicts['GFASTATS'][item]['report_table_html'] }}
+
+
+{% if vars.update({'is_first': False}) %} {% endif %} {% endfor %} diff --git a/bin/report_modules/templates/header.html b/bin/report_modules/templates/header.html index 989b37f0..795ecd3d 100644 --- a/bin/report_modules/templates/header.html +++ b/bin/report_modules/templates/header.html @@ -213,6 +213,18 @@ .iframe-wrapper { text-align: center; + width: 90%; + margin-left: auto; + margin-right: auto; + margin-bottom: 32px; + } + + .iframe-wrapper-hic { + width: 700px; + height: 850px; + margin-left: auto; + margin-right: auto; + margin-bottom: 32px; } .tab { diff --git a/bin/report_modules/templates/hic/hic.html b/bin/report_modules/templates/hic/hic.html index 868dc089..2af561e7 100644 --- a/bin/report_modules/templates/hic/hic.html +++ b/bin/report_modules/templates/hic/hic.html @@ -1,18 +1,45 @@ diff --git a/bin/report_modules/templates/hic/report_contents.html b/bin/report_modules/templates/hic/report_contents.html index 312a0fdd..c7f19be8 100644 --- a/bin/report_modules/templates/hic/report_contents.html +++ b/bin/report_modules/templates/hic/report_contents.html @@ -1,17 +1,33 @@ {% set vars = {'is_first': True} %} {% for item in range(all_stats_dicts["HIC"]|length) %} {% set active_text = 'display: block' if vars.is_first else 'display: none' %}
-
-
-
{{ all_stats_dicts['HIC'][item]['hap'] }}
-
-
- -
+
+
+
{{ all_stats_dicts['HIC'][item]['hap'] }}
+
+ +
+
+

Sequence labels and lengths

+
+
+
{{ all_stats_dicts['HIC'][item]['labels_table_html'] }}
+
+
+

HiC QC report

+
+
+ +
+ {% if all_stats_dicts['HIC'][item]['fastp_log'] is not none %} +
+

fastp log

+
+
+ {{ all_stats_dicts['HIC'][item]['fastp_log'] }} +
+ {% endif %} +
{% if vars.update({'is_first': False}) %} {% endif %} {% endfor %} diff --git a/bin/report_modules/templates/orthofinder/orthofinder.html b/bin/report_modules/templates/orthofinder/orthofinder.html new file mode 100644 index 00000000..35928c4d --- /dev/null +++ b/bin/report_modules/templates/orthofinder/orthofinder.html @@ -0,0 +1,14 @@ + diff --git a/bin/report_modules/templates/orthofinder/report_contents.html b/bin/report_modules/templates/orthofinder/report_contents.html new file mode 100644 index 00000000..8afa502a --- /dev/null +++ b/bin/report_modules/templates/orthofinder/report_contents.html @@ -0,0 +1,33 @@ +
+
+ +
+

Species tree (rooted)

+
+
+ +
+ +
+

General statistics

+
+
+
{{ all_stats_dicts['ORTHOFINDER']['general_stats_html'] }}
+
+ +
+

Genes per-species in orthogroup

+
+
+
{{ all_stats_dicts['ORTHOFINDER']['genes_per_species_html'] }}
+
+ +
+

Number of species in orthogroup

+
+
+
{{ all_stats_dicts['ORTHOFINDER']['num_species_orthogroup_html'] }}
+
+ +
+
diff --git a/conf/base.config b/conf/base.config index 87a5d686..e39fbece 100644 --- a/conf/base.config +++ b/conf/base.config @@ -10,9 +10,9 @@ process { - cpus = { check_max( 1 * task.attempt, 'cpus' ) } - memory = { check_max( 6.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 1 * task.attempt } + memory = { 6.GB * task.attempt } + time = { 4.h * task.attempt } errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' } maxRetries = 1 @@ -24,30 +24,30 @@ process { // If possible, it would be nice to keep the same label naming convention when // adding in your local modules too. withLabel:process_single { - cpus = { check_max( 1 , 'cpus' ) } - memory = { check_max( 6.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 1 } + memory = { 6.GB * task.attempt } + time = { 4.h * task.attempt } } withLabel:process_low { - cpus = { check_max( 2 * task.attempt, 'cpus' ) } - memory = { check_max( 12.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 2 * task.attempt } + memory = { 12.GB * task.attempt } + time = { 4.h * task.attempt } } withLabel:process_medium { - cpus = { check_max( 6 * task.attempt, 'cpus' ) } - memory = { check_max( 36.GB * task.attempt, 'memory' ) } - time = { check_max( 8.h * task.attempt, 'time' ) } + cpus = { 6 * task.attempt } + memory = { 36.GB * task.attempt } + time = { 8.h * task.attempt } } withLabel:process_high { - cpus = { check_max( 12 * task.attempt, 'cpus' ) } - memory = { check_max( 72.GB * task.attempt, 'memory' ) } - time = { check_max( 16.h * task.attempt, 'time' ) } + cpus = { 12 * task.attempt } + memory = { 72.GB * task.attempt } + time = { 16.h * task.attempt } } withLabel:process_long { - time = { check_max( 20.h * task.attempt, 'time' ) } + time = { 20.h * task.attempt } } withLabel:process_high_memory { - memory = { check_max( 200.GB * task.attempt, 'memory' ) } + memory = { 200.GB * task.attempt } } withLabel:error_ignore { errorStrategy = 'ignore' @@ -57,23 +57,24 @@ process { maxRetries = 2 } withName:NCBI_FCS_GX_SCREEN_SAMPLES { - time = { check_max( 20.h * task.attempt, 'time' ) } - memory = { check_max( 512.GB * task.attempt, 'memory' ) } + time = { 20.h * task.attempt } + memory = { 512.GB * task.attempt } } withName:KRAKEN2 { - memory = { check_max( 200.GB * task.attempt, 'memory' ) } + memory = { 256.GB * task.attempt } } withName:BWA_MEM { - time = { check_max( 2.day * task.attempt, 'time' ) } + time = { 2.day * task.attempt } } withName:SAMBLASTER { - time = { check_max( 20.h * task.attempt, 'time' ) } + time = { 20.h * task.attempt } } withName:DNADIFF { - time = { check_max( 7.day * task.attempt, 'time' ) } + time = { 7.day * task.attempt } + memory = { 12.GB * task.attempt } } withName:MERQURY_HAPMERS { - time = { check_max( 20.h * task.attempt, 'time' ) } + time = { 20.h * task.attempt } } withName:CREATEREPORT { cache = false diff --git a/conf/modules.config b/conf/modules.config index 5f857db6..090ebd3a 100644 --- a/conf/modules.config +++ b/conf/modules.config @@ -12,7 +12,7 @@ process { - withName: SEQKIT_RMDUP { + withName: '.*:ASSEMBLYQC:SEQKIT_RMDUP' { ext.args = '--by-seq --ignore-case' ext.prefix = { "${meta.id}.seqkit.rmdup" } } @@ -42,7 +42,7 @@ process { ] } - withName: ASSEMBLATHON_STATS { + withName: '.*:ASSEMBLYQC:ASSEMBLATHON_STATS' { publishDir = [ path: { "${params.outdir}/assemblathon_stats" }, mode: params.publish_dir_mode, @@ -50,7 +50,17 @@ process { ] } - withName: FCS_FCSADAPTOR { + withName: '.*:ASSEMBLYQC:GFASTATS' { + ext.args = '--stats -t --nstar-report' + publishDir = [ + path: { "${params.outdir}/gfastats" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals("versions.yml") ? null : filename }, + pattern: '*.assembly_summary' + ] + } + + withName: '.*:ASSEMBLYQC:FCS_FCSADAPTOR' { ext.args = params.ncbi_fcs_adaptor_empire ? "--${params.ncbi_fcs_adaptor_empire}" : '--prok' publishDir = [ @@ -60,7 +70,7 @@ process { ] } - withName: NCBI_FCS_GX_SCREEN_SAMPLES { + withName: '.*:NCBI_FCS_GX:NCBI_FCS_GX_SCREEN_SAMPLES' { publishDir = [ path: { "${params.outdir}/ncbi_fcs_gx" }, mode: params.publish_dir_mode, @@ -68,7 +78,7 @@ process { ] } - withName: NCBI_FCS_GX_KRONA_PLOT { + withName: '.*:NCBI_FCS_GX:NCBI_FCS_GX_KRONA_PLOT' { publishDir = [ path: { "${params.outdir}/ncbi_fcs_gx" }, mode: params.publish_dir_mode, @@ -118,7 +128,7 @@ process { ] } - withName: KRAKEN2 { + withName: '.*:FASTA_KRAKEN2:KRAKEN2' { publishDir = [ path: { "${params.outdir}/kraken2" }, mode: params.publish_dir_mode, @@ -126,7 +136,7 @@ process { ] } - withName: KRAKEN2_KRONA_PLOT { + withName: '.*:FASTA_KRAKEN2:KRAKEN2_KRONA_PLOT' { publishDir = [ path: { "${params.outdir}/kraken2" }, mode: params.publish_dir_mode, @@ -134,7 +144,7 @@ process { ] } - withName: CIRCOS { + withName: '.*:FASTA_SYNTENY:CIRCOS' { publishDir = [ path: { "${params.outdir}/synteny/${target_on_ref_seq}" }, mode: params.publish_dir_mode, @@ -142,7 +152,7 @@ process { ] } - withName: LINEARSYNTENY { + withName: '.*:FASTA_SYNTENY:LINEARSYNTENY' { publishDir = [ path: { "${params.outdir}/synteny/${target_on_ref_seq}" }, mode: params.publish_dir_mode, @@ -154,7 +164,7 @@ process { ext.prefix = { "${meta.id}.plotsr" } } - withName: MINIMAP2_ALIGN { + withName: '.*:FASTA_SYNTENY:MINIMAP2_ALIGN' { ext.args = '-x asm5 --eqx' } @@ -177,17 +187,17 @@ process { ] } - withName: FILTER_BY_LENGTH { + withName: '.*:FASTA_EXPLORE_SEARCH_PLOT_TIDK:FILTER_BY_LENGTH' { ext.args = params.tidk_filter_by_size ? "-m ${params.tidk_filter_size_bp}" : '' ext.prefix = { "${meta.id}.filtered" } } - withName: SORT_BY_LENGTH { + withName: '.*:FASTA_EXPLORE_SEARCH_PLOT_TIDK:SORT_BY_LENGTH' { ext.args = '--quiet --reverse --by-length' ext.prefix = { "${meta.id}.sorted" } } - withName: TIDK_EXPLORE { + withName: '.*:FASTA_EXPLORE_SEARCH_PLOT_TIDK:TIDK_EXPLORE' { ext.args = '--minimum 5 --maximum 30' publishDir = [ path: { "${params.outdir}/tidk" }, @@ -196,7 +206,7 @@ process { ] } - withName: TIDK_SEARCH_APRIORI { + withName: '.*:FASTA_EXPLORE_SEARCH_PLOT_TIDK:TIDK_SEARCH_APRIORI' { ext.prefix = { "${meta.id}.apriori" } ext.args = '--extension tsv' publishDir = [ @@ -206,7 +216,7 @@ process { ] } - withName: TIDK_SEARCH_APOSTERIORI { + withName: '.*:FASTA_EXPLORE_SEARCH_PLOT_TIDK:TIDK_SEARCH_APOSTERIORI' { ext.prefix = { "${meta.id}.aposteriori" } ext.args = '--extension tsv' publishDir = [ @@ -216,7 +226,7 @@ process { ] } - withName: TIDK_PLOT_APRIORI { + withName: '.*:FASTA_EXPLORE_SEARCH_PLOT_TIDK:TIDK_PLOT_APRIORI' { ext.prefix = { "${meta.id}.apriori" } publishDir = [ path: { "${params.outdir}/tidk" }, @@ -225,7 +235,7 @@ process { ] } - withName: TIDK_PLOT_APOSTERIORI { + withName: '.*:FASTA_EXPLORE_SEARCH_PLOT_TIDK:TIDK_PLOT_APOSTERIORI' { ext.prefix = { "${meta.id}.aposteriori" } publishDir = [ path: { "${params.outdir}/tidk" }, @@ -238,7 +248,7 @@ process { ext.args = '-u' } - withName: CUSTOM_SHORTENFASTAIDS { + withName: '.*:FASTA_LTRRETRIEVER_LAI:CUSTOM_SHORTENFASTAIDS' { publishDir = [ path: { "${params.outdir}/lai" }, mode: params.publish_dir_mode, @@ -246,11 +256,11 @@ process { ] } - withName: LTRHARVEST { + withName: '.*:FASTA_LTRRETRIEVER_LAI:LTRHARVEST' { ext.prefix = { "${meta.id}_ltrharvest" } } - withName: LTRFINDER { + withName: '.*:FASTA_LTRRETRIEVER_LAI:LTRFINDER' { ext.args = '-harvest_out -size 1000000 -time 300' } @@ -258,7 +268,7 @@ process { ext.prefix = { "${meta.id}_ltrharvest_ltrfinder.tabout" } } - withName: LTRRETRIEVER_LTRRETRIEVER { + withName: '.*:FASTA_LTRRETRIEVER_LAI:LTRRETRIEVER_LTRRETRIEVER' { publishDir = [ path: { "${params.outdir}/lai" }, mode: params.publish_dir_mode, @@ -266,7 +276,7 @@ process { ] } - withName: CUSTOM_RESTOREGFFIDS { + withName: '.*:FASTA_LTRRETRIEVER_LAI:CUSTOM_RESTOREGFFIDS' { publishDir = [ path: { "${params.outdir}/lai" }, mode: params.publish_dir_mode, @@ -274,7 +284,7 @@ process { ] } - withName: LTRRETRIEVER_LAI { + withName: '.*:FASTA_LTRRETRIEVER_LAI:LTRRETRIEVER_LAI' { publishDir = [ path: { "${params.outdir}/lai" }, mode: params.publish_dir_mode, @@ -282,7 +292,7 @@ process { ] } - withName: '.*:FASTQ_FASTQC_UMITOOLS_FASTP:FASTQC_RAW' { + withName: '.*:FQ2HIC:FASTQ_FASTQC_UMITOOLS_FASTP:FASTQC_RAW' { publishDir = [ path: { "${params.outdir}/hic/fastqc_raw" }, mode: params.publish_dir_mode, @@ -290,7 +300,7 @@ process { ] } - withName: '.*:FASTQ_FASTQC_UMITOOLS_FASTP:FASTQC_TRIM' { + withName: '.*:FQ2HIC:FASTQ_FASTQC_UMITOOLS_FASTP:FASTQC_TRIM' { publishDir = [ path: { "${params.outdir}/hic/fastqc_trim" }, mode: params.publish_dir_mode, @@ -298,7 +308,7 @@ process { ] } - withName: '.*:FASTQ_FASTQC_UMITOOLS_FASTP:FASTP' { + withName: '.*:FQ2HIC:FASTQ_FASTQC_UMITOOLS_FASTP:FASTP' { ext.args = params.hic_fastp_ext_args publishDir = [ path: { "${params.outdir}/hic/fastp" }, @@ -311,17 +321,20 @@ process { ext.args = '--ignore-case --natural-order' } - withName: BWA_MEM { + withName: '.*:FQ2HIC:FASTQ_BWA_MEM_SAMBLASTER:BWA_MEM' { ext.prefix = { "${meta.id}.on.${meta.ref_id}.bwa.mem" } ext.args = '-5SP' } - withName: SAMBLASTER { + withName: '.*:FQ2HIC:FASTQ_BWA_MEM_SAMBLASTER:SAMBLASTER' { ext.prefix = { "${meta.id}.on.${meta.ref_id}.samblaster" } - ext.args3 = '-h -F 2316' + ext.args3 = [ + '-h', + params.hic_samtools_ext_args ? params.hic_samtools_ext_args.split("\\s(?=-+)") : '' + ].flatten().unique(false).join(' ').trim() } - withName: AGP2ASSEMBLY { + withName: '.*:FQ2HIC:AGP2ASSEMBLY' { publishDir = [ path: { "${params.outdir}/hic/assembly" }, mode: params.publish_dir_mode, @@ -329,7 +342,7 @@ process { ] } - withName: ASSEMBLY2BEDPE { + withName: '.*:FQ2HIC:ASSEMBLY2BEDPE' { publishDir = [ path: { "${params.outdir}/hic/bedpe" }, mode: params.publish_dir_mode, @@ -337,7 +350,7 @@ process { ] } - withName: HIC2HTML { + withName: '.*:FQ2HIC:HIC2HTML' { publishDir = [ path: { "${params.outdir}/hic" }, mode: params.publish_dir_mode, @@ -345,7 +358,7 @@ process { ] } - withName: HICQC { + withName: '.*:FQ2HIC:HICQC' { publishDir = [ path: { "${params.outdir}/hic/hicqc" }, mode: params.publish_dir_mode, @@ -353,7 +366,7 @@ process { ] } - withName: RUNASSEMBLYVISUALIZER { + withName: '.*:FQ2HIC:RUNASSEMBLYVISUALIZER' { publishDir = [ path: { "${params.outdir}/hic" }, mode: params.publish_dir_mode, @@ -361,11 +374,11 @@ process { ] } - withName: TAG_ASSEMBLY { + withName: '.*:ASSEMBLYQC:TAG_ASSEMBLY' { ext.prefix = { "${meta.id}.fasta" } } - withName: MERQURY_MERQURY { + withName: '.*:ASSEMBLYQC:MERQURY_MERQURY' { publishDir = [ [ path: { "${params.outdir}/merqury/${meta.id}" }, @@ -375,11 +388,32 @@ process { ] } - withName: CREATEREPORT { + withName: '.*:ASSEMBLYQC:GFFREAD' { + ext.args = '-y -S' + } + + withName: '.*:ASSEMBLYQC:ORTHOFINDER' { publishDir = [ - path: { "$params.outdir" }, + path: { "${params.outdir}/orthofinder" }, mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals("versions.yml") ? null : filename } + saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, + ] + } + + withName: '.*:ASSEMBLYQC:CREATEREPORT' { + publishDir = [ + [ + path: { "$params.outdir" }, + mode: params.publish_dir_mode, + pattern: 'report.json', + contentType: 'application/json' + ], + [ + path: { "$params.outdir" }, + mode: params.publish_dir_mode, + pattern: 'report.html', + contentType: 'text/html' + ] ] } } diff --git a/conf/test.config b/conf/test.config index b48e728f..e86178f3 100644 --- a/conf/test.config +++ b/conf/test.config @@ -5,19 +5,22 @@ Defines input files and everything required to run a fast and simple pipeline test. Use as follows: - nextflow run plant-food-research-open/assemblyqc -profile test, --outdir + nextflow run plant-food-research-open/assemblyqc -revision -profile test, --outdir ---------------------------------------------------------------------------------------- */ +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + params { config_profile_name = 'Test profile' config_profile_description = 'Minimal test dataset to check pipeline function' input = 'https://raw.githubusercontent.com/plant-food-research-open/assemblyqc/dev/assets/assemblysheetv2.csv' - - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' } diff --git a/conf/test_full.config b/conf/test_full.config index a8e663ab..f2ad3d1a 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -5,7 +5,7 @@ Defines input files and everything required to run a full size pipeline test. Use as follows: - nextflow run plant-food-research-open/assemblyqc -profile test_full, --outdir + nextflow run plant-food-research-open/assemblyqc -revision -profile test_full, --outdir ---------------------------------------------------------------------------------------- */ @@ -16,6 +16,8 @@ params { input = 'https://raw.githubusercontent.com/plant-food-research-open/assemblyqc/dev/assets/assemblysheetv2.csv' + gfastats_skip = false + ncbi_fcs_adaptor_skip = false ncbi_fcs_adaptor_empire = 'euk' @@ -23,24 +25,24 @@ params { ncbi_fcs_gx_tax_id = 35717 // ncbi_fcs_gx_db_path = 'https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/database/r2023-01-24' + tidk_skip = false + tidk_repeat_seq = 'TTTGGG' + busco_skip = false busco_mode = 'genome' busco_lineage_datasets = 'fungi_odb10 hypocreales_odb10' - tidk_skip = false - tidk_repeat_seq = 'TTTGGG' - lai_skip = false kraken2_skip = true // Skipping this step as the dataset is humengous (126 GB). Please download the dataset manually - // kraken2_db_path = 'https://genome-idx.s3.amazonaws.com/kraken/k2_pluspfp_20240112.tar.gz' + // kraken2_db_path = 'https://genome-idx.s3.amazonaws.com/kraken/k2_pluspfp_20240904.tar.gz' hic = 'SRR8238190' + merqury_skip = false + synteny_skip = false synteny_mummer_skip = false synteny_plotsr_skip = false synteny_xref_assemblies = 'https://raw.githubusercontent.com/plant-food-research-open/assemblyqc/dev/assets/xrefsheet.csv' - - merqury_skip = false } diff --git a/docs/images/assemblyqc.png b/docs/images/assemblyqc.png new file mode 100644 index 00000000..3fd97dd1 Binary files /dev/null and b/docs/images/assemblyqc.png differ diff --git a/docs/images/fastp.png b/docs/images/fastp.png new file mode 100644 index 00000000..b968889a Binary files /dev/null and b/docs/images/fastp.png differ diff --git a/docs/images/hicqc.png b/docs/images/hicqc.png new file mode 100644 index 00000000..edff46d9 Binary files /dev/null and b/docs/images/hicqc.png differ diff --git a/docs/images/orthofinder.png b/docs/images/orthofinder.png new file mode 100644 index 00000000..4a53e3c0 Binary files /dev/null and b/docs/images/orthofinder.png differ diff --git a/docs/output.md b/docs/output.md index ae9e7f75..68adc287 100644 --- a/docs/output.md +++ b/docs/output.md @@ -8,25 +8,27 @@ The directories listed below will be created in the results directory after the ## Pipeline overview -The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: +The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data to produce following outputs: -- [FASTA and GFF3 validation](#fasta-and-gff3-validation) +- [Format validation](#format-validation) - [Assemblathon stats](#assemblathon-stats) -- [Genometools gt stat](#genometools-gt-stat) -- [NCBI FCS adaptor](#ncbi-fcs-adaptor) -- [NCBI FCS GX](#ncbi-fcs-gx) +- [gfastats](#gfastats) +- [NCBI FCS-adaptor](#ncbi-fcs-adaptor) +- [NCBI FCS-GX](#ncbi-fcs-gx) +- [tidk](#tidk) - [BUSCO](#busco) -- [TIDK](#tidk) - [LAI](#lai) -- [Kraken2](#kraken2) +- [Kraken 2](#kraken-2) - [HiC contact map](#hic-contact-map) -- [Synteny](#synteny) - [Merqury](#merqury) +- [Synteny](#synteny) +- [GenomeTools gt stat](#genometools-gt-stat) +- [OrthoFinder](#orthofinder) - [Pipeline information](#pipeline-information) -### FASTA and GFF3 validation +### Format validation The pipeline prints a warning in the pipeline log if FASTA or GFF3 validation fails. The error log from the validator is reported in the `report.html`. The remaining QC tools are skipped for the assembly with invalid fasta file. @@ -45,33 +47,31 @@ The pipeline prints a warning in the pipeline log if FASTA or GFF3 validation fa > [!WARNING] > Contig-related stats are based on the assumption that `assemblathon_stats_n_limit` is specified correctly. If you are not certain of the value of `assemblathon_stats_n_limit`, please ignore the contig-related stats. -### Genometools gt stat +### gfastats
Output files -- `genometools_gt_stat/` - - `*.gt.stat.yml`: Assembly annotation stats in yaml format. +- `gfastats/` + - `*.assembly_summary`: Assembly stats in TSV format.
-GenomeTools `gt stat` tool calculates a basic set of statistics about features contained in GFF3 files. - -
AssemblyQC - GenomeTools gt stat gene length distribution
AssemblyQC - GenomeTools gt stat gene length distribution
+gfastats is a fast and exhaustive tool for summary statistics. -### NCBI FCS adaptor +### NCBI FCS-adaptor
Output files - `ncbi_fcs_adaptor/` - - `*_fcs_adaptor_report.tsv`: NCBI FCS adaptor report in CSV format. + - `*_fcs_adaptor_report.tsv`: NCBI FCS-adaptor report in CSV format.
-[FCS-adaptor detects](https://github.com/ncbi/fcs/wiki/FCS-adaptor#rules-for-action-assignment) adaptor and vector contamination in genome sequences. +[FCS-adaptor](https://github.com/ncbi/fcs/wiki/FCS-adaptor#rules-for-action-assignment) detects adaptor and vector contamination in genome sequences. -### NCBI FCS GX +### NCBI FCS-GX
Output files @@ -85,42 +85,47 @@ GenomeTools `gt stat` tool calculates a basic set of statistics about features c
-[FCS-GX detects](https://github.com/ncbi/fcs/wiki/FCS-GX#outputs) contamination from foreign organisms in genome sequences. +[FCS-GX](https://github.com/ncbi/fcs/wiki/FCS-GX#outputs) detects contamination from foreign organisms in genome sequences. -### BUSCO +### tidk
Output files -- `busco/` - - `busco_figure.png`: Summary figure created from all the BUSCO summaries. - - `tag` - - `short_summary.specific.*_odb10.tag_*.txt`: BUSCO summary for the assembly represented by `tag`. +- `tidk/` + - `*.apriori.tsv`: Frequencies for successive windows in forward and reverse directions for the pre-specified telomere-repeat sequence. + - `*.apriori.svg`: Plot of `*.apriori.tsv` + - `*.tidk.explore.tsv`: List of the most frequent repeat sequences. + - `*.top.sequence.txt`: The top sequence from `*.tidk.explore.tsv`. + - `*.aposteriori.tsv`: Frequencies for successive windows in forward and reverse directions for the top sequence from `*.top.sequence.txt`. + - `*.aposteriori.svg`: Plot of `*.aposteriori.tsv`.
-[BUSCO estimates](https://busco.ezlab.org/busco_userguide.html) the completeness and redundancy of processed genomic data based on universal single-copy orthologs. +tidk toolkit is designed to [identify and visualize](https://github.com/tolkit/telomeric-identifier) telomeric repeats for the Darwin Tree of Life genomes. -
AssemblyQC - BUSCO summary plot
AssemblyQC - BUSCO summary plot
+
AssemblyQC - tidk plot
AssemblyQC - tidk plot
-### TIDK +### BUSCO
Output files -- `tidk/` - - `*.apriori.tsv`: Frequencies for successive windows in forward and reverse directions for the pre-specified telomere-repeat sequence. - - `*.apriori.svg`: Plot of `*.apriori.tsv` - - `*.tidk.explore.tsv`: List of the most frequent repeat sequences. - - `*.top.sequence.txt`: The top sequence from `*.tidk.explore.tsv`. - - `*.aposteriori.tsv`: Frequencies for successive windows in forward and reverse directions for the top sequence from `*.top.sequence.txt`. - - `*.aposteriori.svg`: Plot of `*.aposteriori.tsv`. +- `busco/` + - `fasta` + - `busco_figure.png`: Summary figure created from all the BUSCO summaries. + - `tag` + - `short_summary.specific.*_odb10.tag_*.txt`: BUSCO summary for the assembly represented by `tag`. + - `gff` + - `busco_figure.png`: Summary figure created from all the BUSCO summaries. + - `tag` + - `short_summary.specific.*_odb10.tag_*.txt`: BUSCO summary for the annotation of the assembly represented by `tag`.
-TIDK toolkit is designed to [identify and visualize](https://github.com/tolkit/telomeric-identifier) telomeric repeats for the Darwin Tree of Life genomes. +[BUSCO](https://busco.ezlab.org/busco_userguide.html) estimates the completeness and redundancy of processed genomic data based on universal single-copy orthologs. -
AssemblyQC - TIDK plot
AssemblyQC - TIDK plot
+
AssemblyQC - BUSCO summary plot
AssemblyQC - BUSCO summary plot
### LAI @@ -141,22 +146,22 @@ LTR Assembly Index (LAI) is a reference-free genome metric that [evaluates assem > [!WARNING] > Soft masked regions are unmasked when calculating LAI. However, hard masked regions are left as is. The pipeline will fail to calculate LAI if all the LTRs are already hard masked. -### Kraken2 +### Kraken 2
Output files - `kraken2/` - - `*.kraken2.report`: [Kraken2 report](https://github.com/DerrickWood/kraken2/wiki/Manual#output-formats). - - `*.kraken2.cut`: [Kraken2 output](https://github.com/DerrickWood/kraken2/wiki/Manual#output-formats). + - `*.kraken2.report`: [Kraken 2 report](https://github.com/DerrickWood/kraken2/wiki/Manual#output-formats). + - `*.kraken2.cut`: [Kraken 2 output](https://github.com/DerrickWood/kraken2/wiki/Manual#output-formats). - `*.kraken2.krona.cut`: [Select columns](../modules/local/kraken2_krona_plot.nf) from `*.kraken2.cut` used for generation of a Krona taxonomy plot. - `*.kraken2.krona.html`: Interactive Krona taxonomy plot.
-Kraken2 [assigns taxonomic labels](https://ccb.jhu.edu/software/kraken2/) to sequencing reads for metagenomics projects. Further reading regarding performance of Kraken2: +Kraken 2 [assigns taxonomic labels](https://ccb.jhu.edu/software/kraken2/) to sequencing reads for metagenomics projects. Further reading regarding performance of Kraken 2: -
AssemblyQC - Interactive Krona plot from Kraken2 taxonomy
AssemblyQC - Interactive Krona plot from Kraken2 taxonomy
+
AssemblyQC - Interactive Krona plot from Kraken 2 taxonomy
AssemblyQC - Interactive Krona plot from Kraken 2 taxonomy
### HiC contact map @@ -165,17 +170,17 @@ Kraken2 [assigns taxonomic labels](https://ccb.jhu.edu/software/kraken2/) to seq - `hic/` - `fastqc_raw/` - - `*_1_fastqc.html/*_2_fastqc.html`: FASTQC html report for the raw reads - - `*_1_fastqc.zip/*_2_fastqc.zip`: FASTQC stats for the raw reads + - `*_1_fastqc.html/*_2_fastqc.html`: FastQC html report for the raw reads + - `*_1_fastqc.zip/*_2_fastqc.zip`: FastQC stats for the raw reads - `fastp/` - - `*.fastp.html`: FASTP HTML report - - `*.fastp.json`: FASTP statistics in JSON format - - `*.fastp.log`: FASTP log - - `*_1.fastp.fastq.gz/*_2.fastp.fastq.gz`: Reads passed by FASTP - - `*_1.fail.fastq.gz/*_2.fail.fastq.gz`: Reads failed by FASTP + - `*.fastp.html`: fastp HTML report + - `*.fastp.json`: fastp statistics in JSON format + - `*.fastp.log`: fastp log + - `*_1.fastp.fastq.gz/*_2.fastp.fastq.gz`: Reads passed by fastp + - `*_1.fail.fastq.gz/*_2.fail.fastq.gz`: Reads failed by fastp - `fastqc_trim/` - - `*_1_fastqc.html/*_2_fastqc.html`: FASTQC html report for the reads passed by FASTP. - - `*_1_fastqc.zip/*_2_fastqc.zip`: FASTQC stats for the reads passed by FASTP. + - `*_1_fastqc.html/*_2_fastqc.html`: FastQC html report for the reads passed by FASTP. + - `*_1_fastqc.zip/*_2_fastqc.zip`: FastQC stats for the reads passed by FASTP. - `hicqc` - `*.on.*_qc_report.pdf`: HiC QC report for reads mapped to an assembly. - `assembly/` @@ -185,7 +190,35 @@ Kraken2 [assigns taxonomic labels](https://ccb.jhu.edu/software/kraken2/) to seq Hi-C contact mapping experiments measure the frequency of physical contact between loci in the genome. The resulting dataset, called a “contact map,” is represented using a [two-dimensional heatmap](https://github.com/igvteam/juicebox.js) where the intensity of each pixel indicates the frequency of contact between a pair of loci. -
AssemblyQC - HiC interactive contact map
AssemblyQC - HiC interactive contact map
+
+AssemblyQC - fastp log for HiC reads +AssemblyQC - HiC QC report +AssemblyQC - HiC interactive contact map +
+AssemblyQC - HiC results +
+ +### Merqury + +
+Output files + +- `merqury/` + - `tag1-and-tag2`: Results folder for haplotype `tag1` and `tag2`. + - `*.completeness.stats`: Assembly completeness statistics + - `*.qv`: Assembly consensus quality QV statistics + - `*.fl.png`: Spectra plots + - `*.hapmers.blob.png`: Hap-mer blob plot +
+ +[Merqury](https://github.com/marbl/merqury) is used for the k-mer analysis. + +
+AssemblyQC - Spectra-cn plot +AssemblyQC - Plotsr synteny plot +
+AssemblyQC - Merqury plots +
### Synteny @@ -206,37 +239,42 @@ Hi-C contact mapping experiments measure the frequency of physical contact betwe - `plotsr.png`: Plotsr synteny plot -[Circos](https://circos.ca) and linear synteny plots are created from genome-wide alignments performed with [MUMMER](https://github.com/mummer4/mummer?tab=readme-ov-file) and [`dnadiff.pl`](https://github.com/mummer4/mummer/blob/master/scripts/dnadiff.pl). +[Circos](https://circos.ca) and dotplots are created from genome-wide alignments performed with [MUMmer](https://github.com/mummer4/mummer?tab=readme-ov-file). Whereas, [Plotsr](https://github.com/schneebergerlab/plotsr) plots are created from genome-wide alignments performed with [Minimap2](https://github.com/lh3/minimap2).
AssemblyQC - Circos synteny plot AssemblyQC - Plotsr synteny plot -AssemblyQC - Dotplot synteny plot +AssemblyQC - dotplot synteny plot
AssemblyQC - Synteny plots
-### Merqury +### GenomeTools gt stat
Output files -- `merqury/` - - `tag1-and-tag2`: Results folder for haplotype `tag1` and `tag2`. - - `*.completeness.stats`: Assembly completeness statistics - - `*.qv`: Assembly consensus quality QV statistics - - `*.fl.png`: Spectra plots - - `*.hapmers.blob.png`: Hap-mer blob plot -
+- `genometools_gt_stat/` + - `*.gt.stat.yml`: Assembly annotation stats in yaml format. -[MERQURY](https://github.com/marbl/merqury) is used for the k-mer analysis. + -
-AssemblyQC - Spectra-cn plot -AssemblyQC - Plotsr synteny plot -
-AssemblyQC - Merqury plots -
+GenomeTools `gt stat` tool calculates a basic set of statistics about features contained in GFF3 files. + +
AssemblyQC - GenomeTools gt stat gene length distribution
AssemblyQC - GenomeTools gt stat gene length distribution
+ +### OrthoFinder + +
+Output files + +- `orthofinder/assemblyqc`: OrthoFinder output folder. + +
+ +If more than one assemblies are included along with their annotations, OrthoFinder is executed on the annotation proteins to perform a phylogenetic orthology inference for comparative genomics. + +
AssemblyQC - OrthoFinder species tree
AssemblyQC - OrthoFinder species tree
### Pipeline information diff --git a/docs/parameters.md b/docs/parameters.md index de300d51..1c8fd3f6 100644 --- a/docs/parameters.md +++ b/docs/parameters.md @@ -4,11 +4,11 @@ A Nextflow pipeline which evaluates assembly quality with multiple QC tools and ## Input/output options -| Parameter | Description | Type | Default | Required | Hidden | -| --------- | ------------------------------------------------------------------------------------------------------------------------ | -------- | --------- | -------- | ------ | -| `input` | Input assembly sheet in CSV format | `string` | | True | | -| `outdir` | The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure. | `string` | ./results | True | | -| `email` | Email address for completion summary. | `string` | | | | +| Parameter | Description | Type | Default | Required | Hidden | +| --------- | ------------------------------------------------------------------------------------------------------------------------ | -------- | ------- | -------- | ------ | +| `input` | Input assembly sheet in CSV format | `string` | | True | | +| `outdir` | The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure. | `string` | | True | | +| `email` | Email address for completion summary. | `string` | | | | ## Validation options @@ -21,6 +21,7 @@ A Nextflow pipeline which evaluates assembly quality with multiple QC tools and | Parameter | Description | Type | Default | Required | Hidden | | ---------------------------- | ----------------------------------------------------------------------- | --------- | ------- | -------- | ------ | | `assemblathon_stats_n_limit` | The number of 'N's for the unknown gap size. NCBI recommendation is 100 | `integer` | 100 | | | +| `gfastats_skip` | Skip Gfastats | `boolean` | True | | | ## NCBI FCS options @@ -33,6 +34,15 @@ A Nextflow pipeline which evaluates assembly quality with multiple QC tools and | `ncbi_fcs_gx_db_path` | Path to NCBI FCS GX database. See: https://github.com/ncbi/fcs/wiki/FCS-GX | `string` | | | | | `contamination_stops_pipeline` | Skip remaining QC steps for an assembly which has adaptor or GX contamination | `boolean` | True | | | +## tidk options + +| Parameter | Description | Type | Default | Required | Hidden | +| --------------------- | ---------------------------------------------------------------------------------------------------------- | --------- | ------- | -------- | ------ | +| `tidk_skip` | Skip telomere identification | `boolean` | True | | | +| `tidk_repeat_seq` | Telomere repeat sequence. Typical values for plant: TTTAGGG, fungus, vertebrates: TTAGGG and Insect: TTAGG | `string` | | | | +| `tidk_filter_by_size` | Filter assembly sequences smaller than the specified length | `boolean` | | | | +| `tidk_filter_size_bp` | Filter size in base-pairs | `integer` | 1000000 | | | + ## BUSCO options | Parameter | Description | Type | Default | Required | Hidden | @@ -42,22 +52,13 @@ A Nextflow pipeline which evaluates assembly quality with multiple QC tools and | `busco_lineage_datasets` | BUSCO lineages. It should be provided as a space-separated list of lineages: 'fungi_odb10 microsporidia_odb10' | `string` | | | | | `busco_download_path` | Download path for BUSCO | `string` | | | | -## TIDK options - -| Parameter | Description | Type | Default | Required | Hidden | -| --------------------- | ---------------------------------------------------------------------------------------------------------- | --------- | ------- | -------- | ------ | -| `tidk_skip` | Skip telomere identification | `boolean` | True | | | -| `tidk_repeat_seq` | Telomere repeat sequence. Typical values for plant: TTTAGGG, fungus, vertebrates: TTAGGG and Insect: TTAGG | `string` | | | | -| `tidk_filter_by_size` | Filter assembly sequences smaller than the specified length | `boolean` | | | | -| `tidk_filter_size_bp` | Filter size in base-pairs | `integer` | 1000000 | | | - ## LAI options | Parameter | Description | Type | Default | Required | Hidden | | ---------- | ------------------- | --------- | ------- | -------- | ------ | | `lai_skip` | Skip LAI estimation | `boolean` | True | | | -## Kraken2 options +## Kraken 2 options | Parameter | Description | Type | Default | Required | Hidden | | ----------------- | --------------------- | --------- | ------- | -------- | ------ | @@ -66,12 +67,20 @@ A Nextflow pipeline which evaluates assembly quality with multiple QC tools and ## HiC options -| Parameter | Description | Type | Default | Required | Hidden | -| -------------------- | ---------------------------------------------------------------------------------------- | --------- | ------------------------------------------------- | -------- | ------ | -| `hic` | HiC reads path provided as a SRA ID or as paired reads such as 'hic_reads{1,2}.fastq.gz' | `string` | | | | -| `hic_skip_fastp` | Skip HiC read trimming | `boolean` | | | | -| `hic_skip_fastqc` | Skip HiC read QC | `boolean` | | | | -| `hic_fastp_ext_args` | Additional parameters for fastp trimming | `string` | --qualified_quality_phred 20 --length_required 50 | | | +| Parameter | Description | Type | Default | Required | Hidden | +| ----------------------- | ---------------------------------------------------------------------------------------- | --------- | ------------------------------------------------- | -------- | ------ | +| `hic` | HiC reads path provided as a SRA ID or as paired reads such as 'hic_reads{1,2}.fastq.gz' | `string` | | | | +| `hic_skip_fastp` | Skip HiC read trimming | `boolean` | | | | +| `hic_skip_fastqc` | Skip HiC read QC | `boolean` | | | | +| `hic_fastp_ext_args` | Additional parameters for fastp trimming | `string` | --qualified_quality_phred 20 --length_required 50 | | | +| `hic_samtools_ext_args` | Additional parameters for samtools view command run after samblaster | `string` | -F 3852 | | | + +## Merqury options + +| Parameter | Description | Type | Default | Required | Hidden | +| --------------------- | -------------------------------- | --------- | ------- | -------- | ------ | +| `merqury_skip` | Skip merqury analysis | `boolean` | True | | | +| `merqury_kmer_length` | kmer length for merqury analysis | `integer` | 21 | | | ## Synteny options @@ -85,29 +94,17 @@ A Nextflow pipeline which evaluates assembly quality with multiple QC tools and | `synteny_mummer_plot_type` | Synteny plot type from Mummer alignments: 'dotplot', 'circos', or 'both' | `string` | both | | | | `synteny_mummer_m2m_align` | Include Mummer alignment blocks with many-to-many mappings | `boolean` | | | | | `synteny_mummer_max_gap` | Mummer alignments within this distance are bundled together | `integer` | 1000000 | | | -| `synteny_mummer_min_bundle_size` | After bundling, any Mummer alignment bundle smaller than this size is filtered out | `integer` | 1000 | | | +| `synteny_mummer_min_bundle_size` | After bundling, any Mummer alignment bundle smaller than this size is filtered out | `integer` | 1000000 | | | | `synteny_plot_1_vs_all` | Create a separate synteny plot for each contig of the target assembly versus all contigs of the reference assembly. This only applies to Mummer plots | `boolean` | | | | | `synteny_color_by_contig` | Mummer synteny plots are colored by contig. Otherwise, they are colored by bundle size | `boolean` | True | | | | `synteny_plotsr_seq_label` | Sequence label prefix for plotsr synteny | `string` | Chr | | | -| `synteny_plotsr_assembly_order` | The order in which the assemblies should be compared, provided as space separated string of assembly tags. If absent, assemblies are ordered by their tags alphabetically. | `string` | -| | | | - -## Merqury options - -| Parameter | Description | Type | Default | Required | Hidden | -| --------------------- | -------------------------------- | --------- | ------- | -------- | ------ | -| `merqury_skip` | Skip merqury analysis | `boolean` | True | | | -| `merqury_kmer_length` | kmer length for merqury analysis | `integer` | 21 | | | - -## Max job request options +| `synteny_plotsr_assembly_order` | The order in which the assemblies should be compared, provided as space separated string of assembly tags. If absent, assemblies are ordered by their tags alphabetically. | `string` | | | | -Set the top limit for requested resources for any single job. +## OrthoFinder options -| Parameter | Description | Type | Default | Required | Hidden | -| ------------ | ---------------------------------------------------------------------------------- | --------- | ------- | -------- | ------ | -| `max_cpus` | Maximum number of CPUs that can be requested for any single job. | `integer` | 16 | | True | -| `max_memory` | Maximum amount of memory that can be requested for any single job. Example: '8.GB' | `string` | 512.GB | | True | -| `max_time` | Maximum amount of time that can be requested for any single job. Example: '1.day' | `string` | 7.day | | True | +| Parameter | Description | Type | Default | Required | Hidden | +| ------------------ | ---------------- | --------- | ------- | -------- | ------ | +| `orthofinder_skip` | Skip orthofinder | `boolean` | True | | | ## Institutional config options @@ -119,24 +116,16 @@ Parameters used to describe centralised config profiles. These should not be edi | `custom_config_base` | Base directory for Institutional configs. | `string` | https://raw.githubusercontent.com/nf-core/configs/master | | True | | `config_profile_name` | Institutional config name. | `string` | | | True | | `config_profile_description` | Institutional config description. | `string` | | | True | -| `config_profile_contact` | Institutional config contact information. | `string` | | | True | -| `config_profile_url` | Institutional config URL link. | `string` | | | True | ## Generic options Less common options for the pipeline, typically set in a config file. -| Parameter | Description | Type | Default | Required | Hidden | -| ---------------------------------- | ----------------------------------------------------------------------- | --------- | ------- | -------- | ------ | -| `help` | Display help text. | `boolean` | | | True | -| `version` | Display version and exit. | `boolean` | | | True | -| `publish_dir_mode` | Method used to save pipeline results to output directory. | `string` | copy | | True | -| `email_on_fail` | Email address for completion summary, only when pipeline fails. | `string` | | | True | -| `plaintext_email` | Send plain-text email instead of HTML. | `boolean` | | | True | -| `monochrome_logs` | Do not use coloured log outputs. | `boolean` | | | True | -| `monochromeLogs` | Do not use coloured log outputs. | `boolean` | | | True | -| `hook_url` | Incoming hook URL for messaging service | `string` | | | True | -| `validate_params` | Boolean whether to validate parameters against the schema at runtime | `boolean` | True | | True | -| `validationShowHiddenParams` | Show all params when using `--help` | `boolean` | | | True | -| `validationFailUnrecognisedParams` | Validation of parameters fails when an unrecognised parameter is found. | `boolean` | | | True | -| `validationLenientMode` | Validation of parameters in lenient more. | `boolean` | | | True | +| Parameter | Description | Type | Default | Required | Hidden | +| ------------------ | --------------------------------------------------------------- | --------- | ------- | -------- | ------ | +| `version` | Display version and exit. | `boolean` | | | True | +| `publish_dir_mode` | Method used to save pipeline results to output directory. | `string` | copy | | True | +| `email_on_fail` | Email address for completion summary, only when pipeline fails. | `string` | | | True | +| `plaintext_email` | Send plain-text email instead of HTML. | `boolean` | | | True | +| `monochrome_logs` | Do not use coloured log outputs. | `boolean` | | | True | +| `hook_url` | Incoming hook URL for messaging service | `string` | | | True | diff --git a/docs/usage.md b/docs/usage.md index 509fdbd2..8590c615 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -2,17 +2,17 @@ - [Assemblysheet input](#assemblysheet-input) - [External databases](#external-databases) - - [NCBI FCS GX database](#ncbi-fcs-gx-database) - - [Kraken2](#kraken2) + - [NCBI FCS-GX database](#ncbi-fcs-gx-database) - [BUSCO](#busco) + - [Kraken 2](#kraken-2) - [Other parameters](#other-parameters) - [Assemblathon stats](#assemblathon-stats) - - [NCBI FCS GX](#ncbi-fcs-gx) + - [NCBI FCS-GX](#ncbi-fcs-gx) + - [tidk](#tidk) - [BUSCO](#busco-1) - - [TIDK](#tidk) - [HiC](#hic) - - [Synteny analysis](#synteny-analysis) - [Merqury K-mer analysis](#merqury-k-mer-analysis) + - [Synteny analysis](#synteny-analysis) - [Minimum System Requirements](#minimum-system-requirements) - [Running the pipeline](#running-the-pipeline) - [Updating the pipeline](#updating-the-pipeline) @@ -26,7 +26,6 @@ - [Custom Containers](#custom-containers) - [Custom Tool Arguments](#custom-tool-arguments) - [nf-core/configs](#nf-coreconfigs) -- [Azure Resource Requests](#azure-resource-requests) - [Running in the background](#running-in-the-background) - [Nextflow memory requirements](#nextflow-memory-requirements) @@ -44,9 +43,9 @@ See the [Merqury](#merqury-k-mer-analysis) section For description of assemblysh ## External databases -### NCBI FCS GX database +### NCBI FCS-GX database -If NCBI FCS GX foreign organism contamination check is executed by setting `ncbi_fcs_gx_skip` to `false`, the path to the GX database must be provided with option `ncbi_fcs_gx_db_path`. The user must ensure that the database is correctly downloaded and placed in a location accessible to the pipeline. Setup instructions are available at . The database path must contain following files: +If NCBI FCS-GX foreign organism contamination check is executed by setting `ncbi_fcs_gx_skip` to `false`, the path to the GX database must be provided with option `ncbi_fcs_gx_db_path`. The user must ensure that the database is correctly downloaded and placed in a location accessible to the pipeline. Setup instructions are available at . The database path must contain following files: ```bash all.assemblies.tsv @@ -60,14 +59,14 @@ all.seq_info.tsv.gz all.taxa.tsv ``` -### Kraken2 - -Path to Kraken2 database is provided by the `kraken2_db_path` parameter. This can be a URL to a public `.tar.gz` file such as `https://genome-idx.s3.amazonaws.com/kraken/k2_pluspfp_20240112.tar.gz`. The pipeline can download and extract the database. This is not the recommended practice owing to the size of the database. Rather, the database should be downloaded, extracted and stored in a read-only location. The path to that location can be provided by the `kraken2_db_path` parameter such as `/workspace/ComparativeDataSources/kraken2db/k2_pluspfp_20230314`. - ### BUSCO BUSCO lineage databases are downloaded and updated by the BUSCO tool itself. A persistent location for the database can be provided by specifying `busco_download_path` parameter. +### Kraken 2 + +Path to Kraken 2 database is provided by the `kraken2_db_path` parameter. This can be a URL to a public `.tar.gz` file such as `https://genome-idx.s3.amazonaws.com/kraken/k2_pluspfp_20240112.tar.gz`. The pipeline can download and extract the database. This is not the recommended practice owing to the size of the database. Rather, the database should be downloaded, extracted and stored in a read-only location. The path to that location can be provided by the `kraken2_db_path` parameter such as `/workspace/ComparativeDataSources/kraken2db/k2_pluspfp_20240904`. + ## Other parameters This section provides additional information for parameters. It does not list all the pipeline parameters. For an exhaustive list, see [parameters.md](./parameters.md). @@ -76,35 +75,22 @@ This section provides additional information for parameters. It does not list al `assemblathon_stats_n_limit` is the number of 'N's for the unknown gap size. This number is used to split the scaffolds into contigs to compute contig-related stats. NCBI's recommendation for unknown gap size is 100 . -### NCBI FCS GX +### NCBI FCS-GX - `ncbi_fcs_gx_tax_id` is the taxonomy ID for all the assemblies listed in the assemblysheet. A taxonomy ID can be obtained by searching a _Genus species_ at . -### BUSCO +### tidk -- `busco_lineage_datasets`: A space-separated list of BUSCO lineages. Any number of lineages can be specified such as "fungi_odb10 hypocreales_odb10". Each assembly is assessed against each of the listed lineage. To select a lineage, refer to . +- `tidk_repeat_seq`: The telomere search sequence. To select an appropriate sequence, see . Commonly used sequences are TTTAGGG (Plant), TTAGGG (Fungus, Vertebrates) and TTAGG (Insect). Further reading: -### TIDK +### BUSCO -- `tidk_repeat_seq`: The telomere search sequence. To select an appropriate sequence, see . Commonly used sequences are TTTAGGG (Plant), TTAGGG (Fungus, Vertebrates) and TTAGG (Insect). Further reading: +- `busco_lineage_datasets`: A space-separated list of BUSCO lineages. Any number of lineages can be specified such as "fungi_odb10 hypocreales_odb10". Each assembly is assessed against each of the listed lineage. To select a lineage, refer to . ### HiC - `hic`: Path to reads provided as a SRA ID or as a path to paired reads such as 'hic_reads{1,2}.fastq.gz'. These reads are applied to each assembly listed by `input`. -### Synteny analysis - -- `synteny_xref_assemblies`: Similar to `--input`, this parameter also provides a CSV sheet listing external reference assemblies which are included in the synteny analysis but are not analysed by other QC tools. See the [example xrefsheet](../assets/xrefsheet.csv) included with the pipeline. Its fields are: - - - `tag:` A unique tag which represents the reference assembly in the final report - - `fasta:` FASTA file - - `synteny_labels:` A two column tsv file listing fasta sequence ids (first column) and their labels for the synteny plots (second column) - -- `synteny_plotsr_assembly_order`: The order in which Minimap2 alignments are performed and, then, plotted by Plotsr. For assembly A, B and C; if the order is specified as 'B C A', then, two alignments are performed. First, C is aligned against B as reference. Second, A is aligned against C as reference. The order of these assemblies on the Plotsr figure is also 'B C A' so that B appears on top, C in the middle and A at the bottom. If this parameter is `null`, the assemblies are ordered alphabetically. All assemblies from `input` and `synteny_xref_assemblies` are included by default. If an assembly is missing from this list, that assembly is excluded from the analysis. - -> [!WARNING] -> PLOTSR performs a sequence-wise (preferably chromosome-wise) synteny analysis. The order of the sequences for each assembly is inferred from its `synteny_labels` file and the order of sequences in the FASTA file is ignored. As all the assemblies are included in a single plot and the number of sequences from each assembly should be same, sequences after the common minimum number are excluded. Afterwards, the sequences are marked sequentially as `Chr1`, `Chr2`, `Chr3`,... If a label other than `Chr` is desirable, it can be configured with the `synteny_plotsr_seq_label` parameter. - ### Merqury K-mer analysis Additional assemblysheet columns: @@ -125,19 +111,32 @@ See following assemblysheet examples for MERQURY analysis. The data for these examples comes from: [umd.edu](https://obj.umiacs.umd.edu/marbl_publications/triobinning/index.html) +### Synteny analysis + +- `synteny_xref_assemblies`: Similar to `--input`, this parameter also provides a CSV sheet listing external reference assemblies which are included in the synteny analysis but are not analysed by other QC tools. See the [example xrefsheet](../assets/xrefsheet.csv) included with the pipeline. Its fields are: + + - `tag:` A unique tag which represents the reference assembly in the final report + - `fasta:` FASTA file + - `synteny_labels:` A two column tsv file listing fasta sequence ids (first column) and their labels for the synteny plots (second column) + +- `synteny_plotsr_assembly_order`: The order in which Minimap2 alignments are performed and, then, plotted by Plotsr. For assembly A, B and C; if the order is specified as 'B C A', then, two alignments are performed. First, C is aligned against B as reference. Second, A is aligned against C as reference. The order of these assemblies on the Plotsr figure is also 'B C A' so that B appears on top, C in the middle and A at the bottom. If this parameter is `null`, the assemblies are ordered alphabetically. All assemblies from `input` and `synteny_xref_assemblies` are included by default. If an assembly is missing from this list, that assembly is excluded from the analysis. + +> [!WARNING] +> PLOTSR performs a sequence-wise (preferably chromosome-wise) synteny analysis. The order of the sequences for each assembly is inferred from its `synteny_labels` file and the order of sequences in the FASTA file is ignored. As all the assemblies are included in a single plot and the number of sequences from each assembly should be same, sequences after the common minimum number are excluded. Afterwards, the sequences are marked sequentially as `Chr1`, `Chr2`, `Chr3`,... If a label other than `Chr` is desirable, it can be configured with the `synteny_plotsr_seq_label` parameter. + ## Minimum System Requirements -All the modules have been tested to work on a single machine with 10 CPUs + 30 GBs of memory, except NCBI FCS GX and Kraken2. Their minimum requirements are: +All the modules have been tested to work on a single machine with 10 CPUs + 32 GBs of memory, except NCBI FCS GX and Kraken2. Their minimum requirements are: - NCBI FCS GX: 1 CPU + 512 GBs memory -- Kraken2: 1 CPU + 200 GBs memory +- Kraken2: 1 CPU + 256 GBs memory ## Running the pipeline The typical command for running the pipeline is as follows: ```bash -nextflow run plant-food-research-open/assemblyqc --input ./assemblysheet.csv --outdir ./results -profile docker +nextflow run plant-food-research-open/assemblyqc -revision --input ./assemblysheet.csv --outdir ./results -profile docker ``` This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles. @@ -161,12 +160,12 @@ Pipeline settings can be provided in a `yaml` or `json` file via `-params-file < The above pipeline run specified with a params file in yaml format: ```bash -nextflow run plant-food-research-open/assemblyqc -profile docker -params-file params.yaml +nextflow run plant-food-research-open/assemblyqc -revision main -profile docker -params-file params.yaml ``` -with `params.yaml` containing: +with: -```yaml +```yaml title="params.yaml" input: "./assemblysheet.csv" outdir: "./results/" ``` @@ -187,7 +186,7 @@ It is a good idea to specify a pipeline version when running the pipeline on you First, go to the [plant-food-research-open/assemblyqc releases page](https://github.com/plant-food-research-open/assemblyqc/releases) and find the latest pipeline version - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. Of course, you can switch to another version by changing the number after the `-r` flag. -This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. For example, at the bottom of the MultiQC reports. +This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. To further assist in reproducbility, you can use share and re-use [parameter files](#running-the-pipeline) to repeat pipeline runs with the same settings without having to write out a command with every single parameter. @@ -273,14 +272,6 @@ See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the [`#configs` channel](https://nfcore.slack.com/channels/configs). -## Azure Resource Requests - -To be used with the `azurebatch` profile by specifying the `-profile azurebatch`. -We recommend providing a compute `params.vm_type` of `Standard_D16_v3` VMs by default but these options can be changed if required. - -Note that the choice of VM size depends on your quota and the overall workload during the analysis. -For a thorough list, please refer the [Azure Sizes for virtual machines in Azure](https://docs.microsoft.com/en-us/azure/virtual-machines/sizes). - ## Running in the background Nextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished. diff --git a/local_assemblyqc b/local_assemblyqc index 858acea8..3a211c34 100755 --- a/local_assemblyqc +++ b/local_assemblyqc @@ -17,10 +17,10 @@ nextflow run \ -profile docker,test_full \ -resume \ $stub \ - --max_cpus 8 \ - --max_memory '32.GB' \ + -c ../nxf-config/resources.config \ --ncbi_fcs_gx_skip false \ --ncbi_fcs_gx_db_path ../dbs/gxdb/test \ --busco_download_path ../dbs/busco \ --kraken2_skip false \ - --kraken2_db_path ../dbs/kraken2db/k2_minusb + --kraken2_db_path ../dbs/kraken2db/k2_minusb \ + --outdir results diff --git a/main.nf b/main.nf index 91e82dff..7e46ba85 100755 --- a/main.nf +++ b/main.nf @@ -7,8 +7,6 @@ ---------------------------------------------------------------------------------------- */ -nextflow.enable.dsl = 2 - /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ IMPORT FUNCTIONS / MODULES / SUBWORKFLOWS / WORKFLOWS @@ -18,7 +16,6 @@ nextflow.enable.dsl = 2 include { ASSEMBLYQC } from './workflows/assemblyqc' include { PIPELINE_INITIALISATION } from './subworkflows/local/utils_nfcore_assemblyqc_pipeline' include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_assemblyqc_pipeline' - /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NAMED WORKFLOWS FOR PIPELINE @@ -54,7 +51,6 @@ workflow PLANTFOODRESEARCHOPEN_ASSEMBLYQC { ch_params_as_json, ch_summary_params_as_json ) - } /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -65,14 +61,11 @@ workflow PLANTFOODRESEARCHOPEN_ASSEMBLYQC { workflow { main: - // // SUBWORKFLOW: Run initialisation tasks // PIPELINE_INITIALISATION ( params.version, - params.help, - params.validate_params, params.monochrome_logs, args, params.outdir, @@ -92,7 +85,6 @@ workflow { PIPELINE_INITIALISATION.out.params_as_json, PIPELINE_INITIALISATION.out.summary_params_as_json ) - // // SUBWORKFLOW: Run completion tasks // @@ -102,7 +94,8 @@ workflow { params.plaintext_email, params.outdir, params.monochrome_logs, - params.hook_url + params.hook_url, + ) } diff --git a/modules.json b/modules.json index 86cf046a..e15a72cd 100644 --- a/modules.json +++ b/modules.json @@ -7,107 +7,117 @@ "gallvp": { "busco/busco": { "branch": "main", - "git_sha": "cc6ea1f3e96e8264d2cc99afed31bcf3b0bb03ca", + "git_sha": "92d59e5f578a2929b75f7588985b9bf451f4c370", "installed_by": ["fasta_gxf_busco_plot"] }, "busco/generateplot": { "branch": "main", - "git_sha": "cc6ea1f3e96e8264d2cc99afed31bcf3b0bb03ca", + "git_sha": "ae9714c21ede9199a3118e3c20b65484aa73e232", "installed_by": ["fasta_gxf_busco_plot"] }, "bwa/index": { "branch": "main", - "git_sha": "cc6ea1f3e96e8264d2cc99afed31bcf3b0bb03ca", + "git_sha": "ae9714c21ede9199a3118e3c20b65484aa73e232", "installed_by": ["fastq_bwa_mem_samblaster"] }, "bwa/mem": { "branch": "main", - "git_sha": "a203a6035aed3f9b345ee380f5d497ca98504d98", + "git_sha": "ae9714c21ede9199a3118e3c20b65484aa73e232", "installed_by": ["fastq_bwa_mem_samblaster"] }, "cat/cat": { "branch": "main", - "git_sha": "cc6ea1f3e96e8264d2cc99afed31bcf3b0bb03ca", + "git_sha": "ae9714c21ede9199a3118e3c20b65484aa73e232", "installed_by": ["fasta_ltrretriever_lai"] }, "custom/relabelfasta": { "branch": "main", - "git_sha": "a203a6035aed3f9b345ee380f5d497ca98504d98", + "git_sha": "a8939d36280e7d9037c7cf164eeede19e46546a4", "installed_by": ["modules"] }, "custom/restoregffids": { "branch": "main", - "git_sha": "a203a6035aed3f9b345ee380f5d497ca98504d98", + "git_sha": "a8939d36280e7d9037c7cf164eeede19e46546a4", "installed_by": ["fasta_ltrretriever_lai"] }, "custom/shortenfastaids": { "branch": "main", - "git_sha": "a203a6035aed3f9b345ee380f5d497ca98504d98", + "git_sha": "a8939d36280e7d9037c7cf164eeede19e46546a4", "installed_by": ["fasta_ltrretriever_lai"] }, "gffread": { "branch": "main", - "git_sha": "cc6ea1f3e96e8264d2cc99afed31bcf3b0bb03ca", + "git_sha": "ae9714c21ede9199a3118e3c20b65484aa73e232", "installed_by": ["fasta_gxf_busco_plot"] }, "gt/gff3": { "branch": "main", - "git_sha": "cc6ea1f3e96e8264d2cc99afed31bcf3b0bb03ca", + "git_sha": "ae9714c21ede9199a3118e3c20b65484aa73e232", "installed_by": ["gff3_gt_gff3_gff3validator_stat"] }, "gt/gff3validator": { "branch": "main", - "git_sha": "cc6ea1f3e96e8264d2cc99afed31bcf3b0bb03ca", + "git_sha": "ae9714c21ede9199a3118e3c20b65484aa73e232", "installed_by": ["gff3_gt_gff3_gff3validator_stat"] }, "gt/stat": { "branch": "main", - "git_sha": "cc6ea1f3e96e8264d2cc99afed31bcf3b0bb03ca", + "git_sha": "ae9714c21ede9199a3118e3c20b65484aa73e232", "installed_by": ["gff3_gt_gff3_gff3validator_stat"] }, + "gunzip": { + "branch": "main", + "git_sha": "ae9714c21ede9199a3118e3c20b65484aa73e232", + "installed_by": ["modules"] + }, "ltrfinder": { "branch": "main", - "git_sha": "cc6ea1f3e96e8264d2cc99afed31bcf3b0bb03ca", + "git_sha": "ae9714c21ede9199a3118e3c20b65484aa73e232", "installed_by": ["fasta_ltrretriever_lai"] }, "ltrharvest": { "branch": "main", - "git_sha": "cc6ea1f3e96e8264d2cc99afed31bcf3b0bb03ca", + "git_sha": "ae9714c21ede9199a3118e3c20b65484aa73e232", "installed_by": ["fasta_ltrretriever_lai"] }, "ltrretriever/lai": { "branch": "main", - "git_sha": "cc6ea1f3e96e8264d2cc99afed31bcf3b0bb03ca", + "git_sha": "ae9714c21ede9199a3118e3c20b65484aa73e232", "installed_by": ["fasta_ltrretriever_lai"] }, "ltrretriever/ltrretriever": { "branch": "main", - "git_sha": "cc6ea1f3e96e8264d2cc99afed31bcf3b0bb03ca", + "git_sha": "ae9714c21ede9199a3118e3c20b65484aa73e232", "installed_by": ["fasta_ltrretriever_lai"] }, + "minimap2/align": { + "branch": "main", + "git_sha": "ae9714c21ede9199a3118e3c20b65484aa73e232", + "installed_by": ["modules"] + }, "plotsr": { "branch": "main", - "git_sha": "a203a6035aed3f9b345ee380f5d497ca98504d98", + "git_sha": "a8939d36280e7d9037c7cf164eeede19e46546a4", "installed_by": ["modules"] }, "samblaster": { "branch": "main", - "git_sha": "cc6ea1f3e96e8264d2cc99afed31bcf3b0bb03ca", + "git_sha": "ae9714c21ede9199a3118e3c20b65484aa73e232", "installed_by": ["fastq_bwa_mem_samblaster"] }, "samtools/faidx": { "branch": "main", - "git_sha": "cc6ea1f3e96e8264d2cc99afed31bcf3b0bb03ca", + "git_sha": "ae9714c21ede9199a3118e3c20b65484aa73e232", "installed_by": ["gff3_gt_gff3_gff3validator_stat"] }, "seqkit/seq": { "branch": "main", - "git_sha": "cc6ea1f3e96e8264d2cc99afed31bcf3b0bb03ca", + "git_sha": "ae9714c21ede9199a3118e3c20b65484aa73e232", "installed_by": ["fasta_ltrretriever_lai"] }, "syri": { "branch": "main", - "git_sha": "a203a6035aed3f9b345ee380f5d497ca98504d98", + "git_sha": "a8939d36280e7d9037c7cf164eeede19e46546a4", "installed_by": ["modules"] } } @@ -131,7 +141,7 @@ }, "gff3_gt_gff3_gff3validator_stat": { "branch": "main", - "git_sha": "58c5f9e695b9e03d43e4c59d9339af7c93f0acbe", + "git_sha": "92d59e5f578a2929b75f7588985b9bf451f4c370", "installed_by": ["subworkflows"] } } @@ -142,107 +152,122 @@ "nf-core": { "custom/sratoolsncbisettings": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["fastq_download_prefetch_fasterqdump_sratools"] }, "fastavalidator": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["modules"] }, "fastp": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["fastq_fastqc_umitools_fastp"] }, "fastqc": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["fastq_fastqc_umitools_fastp"] }, "fcs/fcsadaptor": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", + "installed_by": ["modules"] + }, + "gfastats": { + "branch": "master", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", + "installed_by": ["modules"] + }, + "gffread": { + "branch": "master", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["modules"] }, "gunzip": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["modules"] }, "merqury/hapmers": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["modules"] }, "merqury/merqury": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["modules"] }, "meryl/count": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["modules"] }, "meryl/unionsum": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["modules"] }, "minimap2/align": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", + "installed_by": ["modules"] + }, + "orthofinder": { + "branch": "master", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["modules"] }, "seqkit/rmdup": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["modules"] }, "seqkit/seq": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["fasta_explore_search_plot_tidk", "fasta_ltrretriever_lai"] }, "seqkit/sort": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["fasta_explore_search_plot_tidk"] }, "sratools/fasterqdump": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["fastq_download_prefetch_fasterqdump_sratools", "modules"] }, "sratools/prefetch": { "branch": "master", - "git_sha": "368e6c90b91adbd171e7c0a1c85a700b86a915af", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["fastq_download_prefetch_fasterqdump_sratools", "modules"] }, "tidk/explore": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["fasta_explore_search_plot_tidk"] }, "tidk/plot": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["fasta_explore_search_plot_tidk"] }, "tidk/search": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["fasta_explore_search_plot_tidk"] }, "umitools/extract": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["fastq_fastqc_umitools_fastp"] }, "untar": { "branch": "master", - "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["modules"] } } @@ -266,17 +291,17 @@ }, "utils_nextflow_pipeline": { "branch": "master", - "git_sha": "d20fb2a9cc3e2835e9d067d1046a63252eb17352", + "git_sha": "3aa0aec1d52d492fe241919f0c6100ebf0074082", "installed_by": ["subworkflows"] }, "utils_nfcore_pipeline": { "branch": "master", - "git_sha": "2fdce49d30c0254f76bc0f13c55c17455c1251ab", + "git_sha": "1b6b9a3338d011367137808b49b923515080e3ba", "installed_by": ["subworkflows"] }, - "utils_nfvalidation_plugin": { + "utils_nfschema_plugin": { "branch": "master", - "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa", + "git_sha": "bbd5a41f4535a8defafe6080e00ea74c45f4f96c", "installed_by": ["subworkflows"] } } diff --git a/modules/gallvp/busco/busco/main.nf b/modules/gallvp/busco/busco/main.nf index f7c1a662..98cf5b04 100644 --- a/modules/gallvp/busco/busco/main.nf +++ b/modules/gallvp/busco/busco/main.nf @@ -11,7 +11,7 @@ process BUSCO_BUSCO { tuple val(meta), path(fasta, stageAs:'tmp_input/*') val mode // Required: One of genome, proteins, or transcriptome val lineage // Required: lineage to check against, "auto" enables --auto-lineage instead - path busco_lineages_path // Recommended: path to busco lineages - downloads if not set + path busco_lineages_path // Recommended: p_ath to busco lineages - downloads if not set path config_file // Optional: busco configuration file output: diff --git a/modules/gallvp/busco/busco/meta.yml b/modules/gallvp/busco/busco/meta.yml index 29745d2c..7cb6d69c 100644 --- a/modules/gallvp/busco/busco/meta.yml +++ b/modules/gallvp/busco/busco/meta.yml @@ -7,81 +7,135 @@ keywords: - proteome tools: - busco: - description: BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB. + description: BUSCO provides measures for quantitative assessment of genome assembly, + gene set, and transcriptome completeness based on evolutionarily informed expectations + of gene content from near-universal single-copy orthologs selected from OrthoDB. homepage: https://busco.ezlab.org/ documentation: https://busco.ezlab.org/busco_userguide.html tool_dev_url: https://gitlab.com/ezlab/busco doi: "10.1007/978-1-4939-9173-0_14" licence: ["MIT"] + identifier: biotools:busco input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - fasta: - type: file - description: Nucleic or amino acid sequence file in FASTA format. - pattern: "*.{fasta,fna,fa,fasta.gz,fna.gz,fa.gz}" - - mode: - type: string - description: The mode to run Busco in. One of genome, proteins, or transcriptome - pattern: "{genome,proteins,transcriptome}" - - lineage: - type: string - description: The BUSCO lineage to use, or "auto" to automatically select lineage - - busco_lineages_path: - type: directory - description: Path to local BUSCO lineages directory. - - config_file: - type: file - description: Path to BUSCO config file. + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - fasta: + type: file + description: Nucleic or amino acid sequence file in FASTA format. + pattern: "*.{fasta,fna,fa,fasta.gz,fna.gz,fa.gz}" + - - mode: + type: string + description: The mode to run Busco in. One of genome, proteins, or transcriptome + pattern: "{genome,proteins,transcriptome}" + - - lineage: + type: string + description: The BUSCO lineage to use, or "auto" to automatically select lineage + - - busco_lineages_path: + type: directory + description: Path to local BUSCO lineages directory. + - - config_file: + type: file + description: Path to BUSCO config file. output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - batch_summary: - type: file - description: Summary of all sequence files analyzed - pattern: "*-busco.batch_summary.txt" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*-busco.batch_summary.txt": + type: file + description: Summary of all sequence files analyzed + pattern: "*-busco.batch_summary.txt" - short_summaries_txt: - type: file - description: Short Busco summary in plain text format - pattern: "short_summary.*.txt" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - short_summary.*.txt: + type: file + description: Short Busco summary in plain text format + pattern: "short_summary.*.txt" - short_summaries_json: - type: file - description: Short Busco summary in JSON format - pattern: "short_summary.*.json" - - busco_dir: - type: directory - description: BUSCO lineage specific output - pattern: "*-busco" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - short_summary.*.json: + type: file + description: Short Busco summary in JSON format + pattern: "short_summary.*.json" - full_table: - type: file - description: Full BUSCO results table - pattern: "full_table.tsv" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*-busco/*/run_*/full_table.tsv": + type: file + description: Full BUSCO results table + pattern: "full_table.tsv" - missing_busco_list: - type: file - description: List of missing BUSCOs - pattern: "missing_busco_list.tsv" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*-busco/*/run_*/missing_busco_list.tsv": + type: file + description: List of missing BUSCOs + pattern: "missing_busco_list.tsv" - single_copy_proteins: - type: file - description: Fasta file of single copy proteins (transcriptome mode) - pattern: "single_copy_proteins.faa" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*-busco/*/run_*/single_copy_proteins.faa": + type: file + description: Fasta file of single copy proteins (transcriptome mode) + pattern: "single_copy_proteins.faa" - seq_dir: - type: directory - description: BUSCO sequence directory - pattern: "busco_sequences" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*-busco/*/run_*/busco_sequences": + type: directory + description: BUSCO sequence directory + pattern: "busco_sequences" - translated_dir: - type: directory - description: Six frame translations of each transcript made by the transcriptome mode - pattern: "translated_dir" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*-busco/*/translated_proteins": + type: directory + description: Six frame translations of each transcript made by the transcriptome + mode + pattern: "translated_dir" + - busco_dir: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*-busco": + type: directory + description: BUSCO lineage specific output + pattern: "*-busco" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@priyanka-surana" - "@charles-plessy" diff --git a/modules/gallvp/busco/busco/tests/main.nf.test.snap b/modules/gallvp/busco/busco/tests/main.nf.test.snap index 1140d5de..825ddb98 100644 --- a/modules/gallvp/busco/busco/tests/main.nf.test.snap +++ b/modules/gallvp/busco/busco/tests/main.nf.test.snap @@ -11,19 +11,19 @@ ] ], "1": [ - + ], "2": [ - + ], "3": [ - + ], "4": [ - + ], "5": [ - + ], "6": [ [ @@ -31,12 +31,12 @@ "id": "test" }, [ - + ] ] ], "7": [ - + ], "8": [ [ @@ -47,7 +47,7 @@ [ [ [ - + ] ] ] @@ -74,7 +74,7 @@ [ [ [ - + ] ] ] @@ -82,10 +82,10 @@ ] ], "full_table": [ - + ], "missing_busco_list": [ - + ], "seq_dir": [ [ @@ -93,21 +93,21 @@ "id": "test" }, [ - + ] ] ], "short_summaries_json": [ - + ], "short_summaries_txt": [ - + ], "single_copy_proteins": [ - + ], "translated_dir": [ - + ], "versions": [ "versions.yml:md5,3fc94714b95c2dc15399a4229d9dd1d9" @@ -159,9 +159,9 @@ ], "meta": { "nf-test": "0.8.4", - "nextflow": "24.04.4" + "nextflow": "23.10.1" }, - "timestamp": "2024-08-22T11:24:24.828742" + "timestamp": "2024-05-03T13:23:50.255602" }, "test_busco_eukaryote_metaeuk": { "content": [ @@ -227,4 +227,4 @@ }, "timestamp": "2024-05-03T13:27:12.724862" } -} \ No newline at end of file +} diff --git a/modules/gallvp/busco/generateplot/meta.yml b/modules/gallvp/busco/generateplot/meta.yml index 796f32b4..72ad2c92 100644 --- a/modules/gallvp/busco/generateplot/meta.yml +++ b/modules/gallvp/busco/generateplot/meta.yml @@ -9,26 +9,31 @@ keywords: - quality control tools: - busco: - description: BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB. + description: BUSCO provides measures for quantitative assessment of genome assembly, + gene set, and transcriptome completeness based on evolutionarily informed expectations + of gene content from near-universal single-copy orthologs selected from OrthoDB. homepage: https://busco.ezlab.org/ documentation: https://busco.ezlab.org/busco_userguide.html tool_dev_url: https://gitlab.com/ezlab/busco doi: "10.1007/978-1-4939-9173-0_14" licence: ["MIT"] + identifier: biotools:busco input: - - short_summary_txt: - type: file - description: One or more short summary txt files from BUSCO - pattern: "short_summary.*.txt" + - - short_summary_txt: + type: file + description: One or more short summary txt files from BUSCO + pattern: "short_summary.*.txt" output: - png: - type: file - description: A summary plot in png format - pattern: "*.png" + - "*.png": + type: file + description: A summary plot in png format + pattern: "*.png" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/gallvp/bwa/index/meta.yml b/modules/gallvp/bwa/index/meta.yml index 6bbc87a6..4884bca2 100644 --- a/modules/gallvp/bwa/index/meta.yml +++ b/modules/gallvp/bwa/index/meta.yml @@ -14,29 +14,32 @@ tools: documentation: https://bio-bwa.sourceforge.net/bwa.shtml arxiv: arXiv:1303.3997 licence: ["GPL-3.0-or-later"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing reference information. - e.g. [ id:'test', single_end:false ] - - fasta: - type: file - description: Input genome fasta file + - - meta: + type: map + description: | + Groovy Map containing reference information. + e.g. [ id:'test', single_end:false ] + - fasta: + type: file + description: Input genome fasta file output: - - meta: - type: map - description: | - Groovy Map containing reference information. - e.g. [ id:'test', single_end:false ] - index: - type: file - description: BWA genome index files - pattern: "*.{amb,ann,bwt,pac,sa}" + - meta: + type: map + description: | + Groovy Map containing reference information. + e.g. [ id:'test', single_end:false ] + - bwa: + type: file + description: BWA genome index files + pattern: "*.{amb,ann,bwt,pac,sa}" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@drpatelh" - "@maxulysse" diff --git a/modules/gallvp/bwa/mem/meta.yml b/modules/gallvp/bwa/mem/meta.yml index b126dd86..37467d29 100644 --- a/modules/gallvp/bwa/mem/meta.yml +++ b/modules/gallvp/bwa/mem/meta.yml @@ -17,55 +17,82 @@ tools: documentation: https://bio-bwa.sourceforge.net/bwa.shtml arxiv: arXiv:1303.3997 licence: ["GPL-3.0-or-later"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - reads: - type: file - description: | - List of input FastQ files of size 1 and 2 for single-end and paired-end data, - respectively. - - meta2: - type: map - description: | - Groovy Map containing reference information. - e.g. [ id:'test', single_end:false ] - - index: - type: file - description: BWA genome index files - pattern: "Directory containing BWA index *.{amb,ann,bwt,pac,sa}" - - fasta: - type: file - description: Reference genome in FASTA format - pattern: "*.{fasta,fa}" - - sort_bam: - type: boolean - description: use samtools sort (true) or samtools view (false) - pattern: "true or false" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. + - - meta2: + type: map + description: | + Groovy Map containing reference information. + e.g. [ id:'test', single_end:false ] + - index: + type: file + description: BWA genome index files + pattern: "Directory containing BWA index *.{amb,ann,bwt,pac,sa}" + - - meta3: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - fasta: + type: file + description: Reference genome in FASTA format + pattern: "*.{fasta,fa}" + - - sort_bam: + type: boolean + description: use samtools sort (true) or samtools view (false) + pattern: "true or false" output: - bam: - type: file - description: Output BAM file containing read alignments - pattern: "*.{bam}" + - meta: + type: file + description: Output BAM file containing read alignments + pattern: "*.{bam}" + - "*.bam": + type: file + description: Output BAM file containing read alignments + pattern: "*.{bam}" - cram: - type: file - description: Output CRAM file containing read alignments - pattern: "*.{cram}" + - meta: + type: file + description: Output CRAM file containing read alignments + pattern: "*.{cram}" + - "*.cram": + type: file + description: Output CRAM file containing read alignments + pattern: "*.{cram}" - csi: - type: file - description: Optional index file for BAM file - pattern: "*.{csi}" + - meta: + type: file + description: Optional index file for BAM file + pattern: "*.{csi}" + - "*.csi": + type: file + description: Optional index file for BAM file + pattern: "*.{csi}" - crai: - type: file - description: Optional index file for CRAM file - pattern: "*.{crai}" + - meta: + type: file + description: Optional index file for CRAM file + pattern: "*.{crai}" + - "*.crai": + type: file + description: Optional index file for CRAM file + pattern: "*.{crai}" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@drpatelh" - "@jeremy1805" diff --git a/modules/gallvp/bwa/mem/tests/main.nf.test.snap b/modules/gallvp/bwa/mem/tests/main.nf.test.snap index 2079ea22..5b456333 100644 --- a/modules/gallvp/bwa/mem/tests/main.nf.test.snap +++ b/modules/gallvp/bwa/mem/tests/main.nf.test.snap @@ -13,14 +13,14 @@ [ "versions.yml:md5,478b816fbd37871f5e8c617833d51d80" ], - "b6d9cb250261a4c125413c5d867d87a7", + "895cf107b1c13b0f0cabc1421a4f5db9", "798439cbd7fd81cbcc5078022dc5479d" ], "meta": { "nf-test": "0.9.0", "nextflow": "24.04.4" }, - "timestamp": "2024-08-02T12:22:28.051598" + "timestamp": "2024-10-10T20:57:38.428887" }, "Single-End Sort": { "content": [ @@ -36,14 +36,14 @@ [ "versions.yml:md5,478b816fbd37871f5e8c617833d51d80" ], - "848434ae4b79cfdcb2281c60b33663ce", + "98350949b5a0931930c69cff0e380274", "94fcf617f5b994584c4e8d4044e16b4f" ], "meta": { "nf-test": "0.9.0", "nextflow": "24.04.4" }, - "timestamp": "2024-08-02T12:22:39.671154" + "timestamp": "2024-10-10T20:57:52.438702" }, "Paired-End": { "content": [ @@ -59,14 +59,14 @@ [ "versions.yml:md5,478b816fbd37871f5e8c617833d51d80" ], - "5b34d31be84478761f789e3e2e805e31", + "a42d8e219e0df24f3d72a646b7c26dbb", "57aeef88ed701a8ebc8e2f0a381b2a6" ], "meta": { "nf-test": "0.9.0", "nextflow": "24.04.4" }, - "timestamp": "2024-08-02T12:22:51.919479" + "timestamp": "2024-10-10T20:58:05.532827" }, "Paired-End Sort": { "content": [ @@ -82,14 +82,14 @@ [ "versions.yml:md5,478b816fbd37871f5e8c617833d51d80" ], - "69003376d9a8952622d8587b39c3eaae", + "ee5c324f68d45c4d01c937b4abbc5de", "af8628d9df18b2d3d4f6fd47ef2bb872" ], "meta": { "nf-test": "0.9.0", "nextflow": "24.04.4" }, - "timestamp": "2024-08-02T12:23:00.833562" + "timestamp": "2024-10-10T20:58:20.859431" }, "Single-end - stub": { "content": [ @@ -182,14 +182,14 @@ [ "versions.yml:md5,478b816fbd37871f5e8c617833d51d80" ], - "5b34d31be84478761f789e3e2e805e31", + "a42d8e219e0df24f3d72a646b7c26dbb", "57aeef88ed701a8ebc8e2f0a381b2a6" ], "meta": { "nf-test": "0.9.0", "nextflow": "24.04.4" }, - "timestamp": "2024-08-02T12:23:09.942545" + "timestamp": "2024-10-10T20:58:35.460705" }, "Paired-end - stub": { "content": [ diff --git a/modules/gallvp/cat/cat/meta.yml b/modules/gallvp/cat/cat/meta.yml index 00a8db0b..81778a06 100644 --- a/modules/gallvp/cat/cat/meta.yml +++ b/modules/gallvp/cat/cat/meta.yml @@ -9,25 +9,32 @@ tools: description: Just concatenation documentation: https://man7.org/linux/man-pages/man1/cat.1.html licence: ["GPL-3.0-or-later"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - files_in: - type: file - description: List of compressed / uncompressed files - pattern: "*" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - files_in: + type: file + description: List of compressed / uncompressed files + pattern: "*" output: - - versions: - type: file - description: File containing software versions - pattern: "versions.yml" - file_out: - type: file - description: Concatenated file. Will be gzipped if file_out ends with ".gz" - pattern: "${file_out}" + - meta: + type: file + description: Concatenated file. Will be gzipped if file_out ends with ".gz" + pattern: "${file_out}" + - ${prefix}: + type: file + description: Concatenated file. Will be gzipped if file_out ends with ".gz" + pattern: "${file_out}" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@erikrikarddaniel" - "@FriederikeHanssen" diff --git a/modules/gallvp/custom/relabelfasta/meta.yml b/modules/gallvp/custom/relabelfasta/meta.yml index 7935872a..bf563aad 100644 --- a/modules/gallvp/custom/relabelfasta/meta.yml +++ b/modules/gallvp/custom/relabelfasta/meta.yml @@ -1,4 +1,3 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "custom_relabelfasta" description: | @@ -20,36 +19,38 @@ tools: documentation: "https://docs.python.org/3/" tool_dev_url: "https://github.com/python/cpython" licence: ["MIT"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - fasta: - type: file - description: Input fasta file - pattern: "*.fasta" - - labels: - type: file - description: | - A TSV file with original (first column) and new ids (second column) - pattern: "*.tsv" - + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - fasta: + type: file + description: Input fasta file + pattern: "*.fasta" + - - labels: + type: file + description: | + A TSV file with original (first column) and new ids (second column) + pattern: "*.tsv" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - versions: - type: file - description: File containing software versions - pattern: "versions.yml" - fasta: - type: file - description: Output fasta file - pattern: "*.fasta" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*.fasta": + type: file + description: Output fasta file + pattern: "*.fasta" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/gallvp/custom/restoregffids/meta.yml b/modules/gallvp/custom/restoregffids/meta.yml index 4e42b829..dea75778 100644 --- a/modules/gallvp/custom/restoregffids/meta.yml +++ b/modules/gallvp/custom/restoregffids/meta.yml @@ -1,4 +1,3 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "custom_restoregffids" description: | @@ -22,36 +21,39 @@ tools: documentation: "https://docs.python.org/3/" tool_dev_url: "https://github.com/python/cpython" licence: ["MIT"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'test' ]` - - gff3: - type: file - description: Input gff3 file - pattern: "*.{gff,gff3}" - - ids_tsv: - type: file - description: | - A TSV file with original (first column) and new ids (second column) - if id change was required - pattern: "*.tsv" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'test' ]` + - gff3: + type: file + description: Input gff3 file + pattern: "*.{gff,gff3}" + - - ids_tsv: + type: file + description: | + A TSV file with original (first column) and new ids (second column) + if id change was required + pattern: "*.tsv" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'test' ]` - restored_ids_gff3: - type: file - description: GFF3 file with restored ids - pattern: "*.restored.ids.gff3" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'test' ]` + - "*.restored.ids.gff3": + type: file + description: GFF3 file with restored ids + pattern: "*.restored.ids.gff3" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/gallvp/custom/shortenfastaids/meta.yml b/modules/gallvp/custom/shortenfastaids/meta.yml index 2425810d..bb14b0ae 100644 --- a/modules/gallvp/custom/shortenfastaids/meta.yml +++ b/modules/gallvp/custom/shortenfastaids/meta.yml @@ -1,4 +1,3 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "custom_shortenfastaids" description: | @@ -22,36 +21,45 @@ tools: tool_dev_url: "https://github.com/biopython/biopython" doi: "10.1093/bioinformatics/btp163" licence: ["MIT"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'test' ]` - - fasta: - type: file - description: Input fasta file - pattern: "*.{fsa,fa,fasta}" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'test' ]` + - fasta: + type: file + description: Input fasta file + pattern: "*.{fsa,fa,fasta}" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'test' ]` - short_ids_fasta: - type: file - description: Fasta file with shortened ids if id change is required - pattern: "*.{fsa,fa,fasta}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'test' ]` + - "*.short.ids.fasta": + type: file + description: Fasta file with shortened ids if id change is required + pattern: "*.{fsa,fa,fasta}" - short_ids_tsv: - type: file - description: | - A TSV file with original (first column) and new ids (second column) - if id change is required - pattern: "*.tsv" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'test' ]` + - "*.short.ids.tsv": + type: file + description: | + A TSV file with original (first column) and new ids (second column) + if id change is required + pattern: "*.tsv" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/gallvp/gffread/meta.yml b/modules/gallvp/gffread/meta.yml index c0602820..bebe7f57 100644 --- a/modules/gallvp/gffread/meta.yml +++ b/modules/gallvp/gffread/meta.yml @@ -1,53 +1,73 @@ name: gffread -description: Validate, filter, convert and perform various other operations on GFF files +description: Validate, filter, convert and perform various other operations on GFF + files keywords: - gff - conversion - validation tools: - gffread: - description: GFF/GTF utility providing format conversions, region filtering, FASTA sequence extraction and more. + description: GFF/GTF utility providing format conversions, region filtering, FASTA + sequence extraction and more. homepage: http://ccb.jhu.edu/software/stringtie/gff.shtml#gffread documentation: http://ccb.jhu.edu/software/stringtie/gff.shtml#gffread tool_dev_url: https://github.com/gpertea/gffread doi: 10.12688/f1000research.23297.1 licence: ["MIT"] + identifier: biotools:gffread input: - - meta: - type: map - description: | - Groovy Map containing meta data - e.g. [ id:'test' ] - - gff: - type: file - description: A reference file in either the GFF3, GFF2 or GTF format. - pattern: "*.{gff, gtf}" - - fasta: - type: file - description: A multi-fasta file with the genomic sequences - pattern: "*.{fasta,fa,faa,fas,fsa}" + - - meta: + type: map + description: | + Groovy Map containing meta data + e.g. [ id:'test' ] + - gff: + type: file + description: A reference file in either the GFF3, GFF2 or GTF format. + pattern: "*.{gff, gtf}" + - - fasta: + type: file + description: A multi-fasta file with the genomic sequences + pattern: "*.{fasta,fa,faa,fas,fsa}" output: - - meta: - type: map - description: | - Groovy Map containing meta data - e.g. [ id:'test' ] - gtf: - type: file - description: GTF file resulting from the conversion of the GFF input file if '-T' argument is present - pattern: "*.{gtf}" + - meta: + type: map + description: | + Groovy Map containing meta data + e.g. [ id:'test' ] + - "*.gtf": + type: file + description: GTF file resulting from the conversion of the GFF input file if + '-T' argument is present + pattern: "*.{gtf}" - gffread_gff: - type: file - description: GFF3 file resulting from the conversion of the GFF input file if '-T' argument is absent - pattern: "*.gff3" + - meta: + type: map + description: | + Groovy Map containing meta data + e.g. [ id:'test' ] + - "*.gff3": + type: file + description: GFF3 file resulting from the conversion of the GFF input file if + '-T' argument is absent + pattern: "*.gff3" - gffread_fasta: - type: file - description: Fasta file produced when either of '-w', '-x', '-y' parameters is present - pattern: "*.fasta" + - meta: + type: map + description: | + Groovy Map containing meta data + e.g. [ id:'test' ] + - "*.fasta": + type: file + description: Fasta file produced when either of '-w', '-x', '-y' parameters + is present + pattern: "*.fasta" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@edmundmiller" maintainers: diff --git a/modules/gallvp/gt/gff3/meta.yml b/modules/gallvp/gt/gff3/meta.yml index 5cecd8d0..62c4cbc6 100644 --- a/modules/gallvp/gt/gff3/meta.yml +++ b/modules/gallvp/gt/gff3/meta.yml @@ -1,7 +1,7 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "gt_gff3" -description: "GenomeTools gt-gff3 utility to parse, possibly transform, and output GFF3 files" +description: "GenomeTools gt-gff3 utility to parse, possibly transform, and output + GFF3 files" keywords: - genome - gff3 @@ -14,34 +14,43 @@ tools: tool_dev_url: "https://github.com/genometools/genometools" doi: "10.1109/TCBB.2013.68" licence: ["ISC"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'test' ]` - - gff3: - type: file - description: Input gff3 file - pattern: "*.{gff,gff3}" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'test' ]` + - gff3: + type: file + description: Input gff3 file + pattern: "*.{gff,gff3}" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'test' ]` - gt_gff3: - type: file - description: Parsed gff3 file produced only if there is no parsing error - pattern: "*.gt.gff3" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'test' ]` + - "*.gt.gff3": + type: file + description: Parsed gff3 file produced only if there is no parsing error + pattern: "*.gt.gff3" - error_log: - type: file - description: Error log if gt-gff3 failed to parse the input gff3 file - pattern: "*.error.log" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'test' ]` + - "*.error.log": + type: file + description: Error log if gt-gff3 failed to parse the input gff3 file + pattern: "*.error.log" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@gallvp" maintainers: diff --git a/modules/gallvp/gt/gff3validator/meta.yml b/modules/gallvp/gt/gff3validator/meta.yml index 3322faf9..98ff256d 100644 --- a/modules/gallvp/gt/gff3validator/meta.yml +++ b/modules/gallvp/gt/gff3validator/meta.yml @@ -1,4 +1,3 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "gt_gff3validator" description: "GenomeTools gt-gff3validator utility to strictly validate a GFF3 file" @@ -15,34 +14,43 @@ tools: tool_dev_url: "https://github.com/genometools/genometools" doi: "10.1109/TCBB.2013.68" licence: ["ISC"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'test' ]` - - gff3: - type: file - description: Input gff3 file - pattern: "*.{gff,gff3}" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'test' ]` + - gff3: + type: file + description: Input gff3 file + pattern: "*.{gff,gff3}" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'test' ]` - success_log: - type: file - description: Log file for successful validation - pattern: "*.success.log" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'test' ]` + - "*.success.log": + type: file + description: Log file for successful validation + pattern: "*.success.log" - error_log: - type: file - description: Log file for failed validation - pattern: "*.error.log" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'test' ]` + - "*.error.log": + type: file + description: Log file for failed validation + pattern: "*.error.log" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/gallvp/gt/stat/meta.yml b/modules/gallvp/gt/stat/meta.yml index fa477f34..3f62f8eb 100644 --- a/modules/gallvp/gt/stat/meta.yml +++ b/modules/gallvp/gt/stat/meta.yml @@ -1,7 +1,7 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "gt_stat" -description: "GenomeTools gt-stat utility to show statistics about features contained in GFF3 files" +description: "GenomeTools gt-stat utility to show statistics about features contained + in GFF3 files" keywords: - genome - gff3 @@ -16,30 +16,33 @@ tools: tool_dev_url: "https://github.com/genometools/genometools" doi: "10.1109/TCBB.2013.68" licence: ["ISC"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'test' ]` - - gff3: - type: file - description: Input gff3 file - pattern: "*.{gff,gff3}" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'test' ]` + - gff3: + type: file + description: Input gff3 file + pattern: "*.{gff,gff3}" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'test' ]` - stats: - type: file - description: Stats file in yaml format - pattern: "*.yml" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'test' ]` + - ${prefix}.yml: + type: file + description: Stats file in yaml format + pattern: "*.yml" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/gallvp/gunzip/environment.yml b/modules/gallvp/gunzip/environment.yml new file mode 100644 index 00000000..c7794856 --- /dev/null +++ b/modules/gallvp/gunzip/environment.yml @@ -0,0 +1,7 @@ +channels: + - conda-forge + - bioconda +dependencies: + - conda-forge::grep=3.11 + - conda-forge::sed=4.8 + - conda-forge::tar=1.34 diff --git a/modules/gallvp/gunzip/main.nf b/modules/gallvp/gunzip/main.nf new file mode 100644 index 00000000..5e67e3b9 --- /dev/null +++ b/modules/gallvp/gunzip/main.nf @@ -0,0 +1,55 @@ +process GUNZIP { + tag "$archive" + label 'process_single' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/ubuntu:22.04' : + 'nf-core/ubuntu:22.04' }" + + input: + tuple val(meta), path(archive) + + output: + tuple val(meta), path("$gunzip"), emit: gunzip + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def extension = ( archive.toString() - '.gz' ).tokenize('.')[-1] + def name = archive.toString() - '.gz' - ".$extension" + def prefix = task.ext.prefix ?: name + gunzip = prefix + ".$extension" + """ + # Not calling gunzip itself because it creates files + # with the original group ownership rather than the + # default one for that user / the work directory + gzip \\ + -cd \\ + $args \\ + $archive \\ + > $gunzip + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + gunzip: \$(echo \$(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*\$//') + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def extension = ( archive.toString() - '.gz' ).tokenize('.')[-1] + def name = archive.toString() - '.gz' - ".$extension" + def prefix = task.ext.prefix ?: name + gunzip = prefix + ".$extension" + """ + touch $gunzip + cat <<-END_VERSIONS > versions.yml + "${task.process}": + gunzip: \$(echo \$(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*\$//') + END_VERSIONS + """ +} diff --git a/modules/gallvp/gunzip/meta.yml b/modules/gallvp/gunzip/meta.yml new file mode 100644 index 00000000..9066c035 --- /dev/null +++ b/modules/gallvp/gunzip/meta.yml @@ -0,0 +1,47 @@ +name: gunzip +description: Compresses and decompresses files. +keywords: + - gunzip + - compression + - decompression +tools: + - gunzip: + description: | + gzip is a file format and a software application used for file compression and decompression. + documentation: https://www.gnu.org/software/gzip/manual/gzip.html + licence: ["GPL-3.0-or-later"] + identifier: "" +input: + - - meta: + type: map + description: | + Optional groovy Map containing meta information + e.g. [ id:'test', single_end:false ] + - archive: + type: file + description: File to be compressed/uncompressed + pattern: "*.*" +output: + - gunzip: + - meta: + type: file + description: Compressed/uncompressed file + pattern: "*.*" + - $gunzip: + type: file + description: Compressed/uncompressed file + pattern: "*.*" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@joseespinosa" + - "@drpatelh" + - "@jfy133" +maintainers: + - "@joseespinosa" + - "@drpatelh" + - "@jfy133" + - "@gallvp" diff --git a/modules/gallvp/gunzip/tests/main.nf.test b/modules/gallvp/gunzip/tests/main.nf.test new file mode 100644 index 00000000..f6610575 --- /dev/null +++ b/modules/gallvp/gunzip/tests/main.nf.test @@ -0,0 +1,121 @@ +nextflow_process { + + name "Test Process GUNZIP" + script "../main.nf" + process "GUNZIP" + tag "gunzip" + tag "modules_gallvp" + tag "modules" + + test("Should run without failures") { + + when { + params { + outdir = "$outputDir" + } + process { + """ + input[0] = Channel.of([ + [], + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) + ] + ) + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + + test("Should run without failures - prefix") { + + config './nextflow.config' + + when { + params { + outdir = "$outputDir" + } + process { + """ + input[0] = Channel.of([ + [ id: 'test' ], + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) + ] + ) + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + + test("Should run without failures - stub") { + + options '-stub' + + when { + params { + outdir = "$outputDir" + } + process { + """ + input[0] = Channel.of([ + [], + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) + ] + ) + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + + test("Should run without failures - prefix - stub") { + + options '-stub' + config './nextflow.config' + + when { + params { + outdir = "$outputDir" + } + process { + """ + input[0] = Channel.of([ + [ id: 'test' ], + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) + ] + ) + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + +} diff --git a/modules/gallvp/gunzip/tests/main.nf.test.snap b/modules/gallvp/gunzip/tests/main.nf.test.snap new file mode 100644 index 00000000..069967e7 --- /dev/null +++ b/modules/gallvp/gunzip/tests/main.nf.test.snap @@ -0,0 +1,134 @@ +{ + "Should run without failures - prefix - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.xyz.fastq:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + "versions.yml:md5,54376d32aca20e937a4ec26dac228e84" + ], + "gunzip": [ + [ + { + "id": "test" + }, + "test.xyz.fastq:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,54376d32aca20e937a4ec26dac228e84" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.2" + }, + "timestamp": "2024-06-25T11:35:10.861293" + }, + "Should run without failures - stub": { + "content": [ + { + "0": [ + [ + [ + + ], + "test_1.fastq:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + "versions.yml:md5,54376d32aca20e937a4ec26dac228e84" + ], + "gunzip": [ + [ + [ + + ], + "test_1.fastq:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,54376d32aca20e937a4ec26dac228e84" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.2" + }, + "timestamp": "2024-06-25T11:35:05.857145" + }, + "Should run without failures": { + "content": [ + { + "0": [ + [ + [ + + ], + "test_1.fastq:md5,4161df271f9bfcd25d5845a1e220dbec" + ] + ], + "1": [ + "versions.yml:md5,54376d32aca20e937a4ec26dac228e84" + ], + "gunzip": [ + [ + [ + + ], + "test_1.fastq:md5,4161df271f9bfcd25d5845a1e220dbec" + ] + ], + "versions": [ + "versions.yml:md5,54376d32aca20e937a4ec26dac228e84" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.2" + }, + "timestamp": "2023-10-17T15:35:37.690477896" + }, + "Should run without failures - prefix": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.xyz.fastq:md5,4161df271f9bfcd25d5845a1e220dbec" + ] + ], + "1": [ + "versions.yml:md5,54376d32aca20e937a4ec26dac228e84" + ], + "gunzip": [ + [ + { + "id": "test" + }, + "test.xyz.fastq:md5,4161df271f9bfcd25d5845a1e220dbec" + ] + ], + "versions": [ + "versions.yml:md5,54376d32aca20e937a4ec26dac228e84" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.2" + }, + "timestamp": "2024-06-25T11:33:32.921739" + } +} \ No newline at end of file diff --git a/modules/gallvp/gunzip/tests/nextflow.config b/modules/gallvp/gunzip/tests/nextflow.config new file mode 100644 index 00000000..dec77642 --- /dev/null +++ b/modules/gallvp/gunzip/tests/nextflow.config @@ -0,0 +1,5 @@ +process { + withName: GUNZIP { + ext.prefix = { "${meta.id}.xyz" } + } +} diff --git a/modules/gallvp/gunzip/tests/tags.yml b/modules/gallvp/gunzip/tests/tags.yml new file mode 100644 index 00000000..fd3f6915 --- /dev/null +++ b/modules/gallvp/gunzip/tests/tags.yml @@ -0,0 +1,2 @@ +gunzip: + - modules/nf-core/gunzip/** diff --git a/modules/gallvp/ltrfinder/meta.yml b/modules/gallvp/ltrfinder/meta.yml index e3c672b9..547fb67d 100644 --- a/modules/gallvp/ltrfinder/meta.yml +++ b/modules/gallvp/ltrfinder/meta.yml @@ -1,4 +1,3 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "ltrfinder" description: | @@ -19,41 +18,52 @@ tools: tool_dev_url: "https://github.com/oushujun/LTR_FINDER_parallel" doi: "10.1186/s13100-019-0193-0" licence: ["MIT"] + identifier: "" - "LTR_Finder": - description: An efficient program for finding full-length LTR retrotranspsons in genome sequences + description: An efficient program for finding full-length LTR retrotranspsons + in genome sequences homepage: "https://github.com/xzhub/LTR_Finder" documentation: "https://github.com/xzhub/LTR_Finder" tool_dev_url: "https://github.com/xzhub/LTR_Finder" doi: "10.1093/nar/gkm286" licence: ["MIT"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - fasta: - type: file - description: Genome sequences in fasta format - pattern: "*.{fsa,fa,fasta}" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - fasta: + type: file + description: Genome sequences in fasta format + pattern: "*.{fsa,fa,fasta}" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - scn: - type: file - description: Annotation in LTRharvest or LTR_FINDER format - pattern: "*.scn" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*.scn": + type: file + description: Annotation in LTRharvest or LTR_FINDER format + pattern: "*.scn" - gff: - type: file - description: Annotation in gff3 format - pattern: "*.gff3" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*.gff3": + type: file + description: Annotation in gff3 format + pattern: "*.gff3" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/gallvp/ltrfinder/tests/main.nf.test b/modules/gallvp/ltrfinder/tests/main.nf.test index c08af00a..4af6bba5 100644 --- a/modules/gallvp/ltrfinder/tests/main.nf.test +++ b/modules/gallvp/ltrfinder/tests/main.nf.test @@ -7,13 +7,13 @@ nextflow_process { tag "modules" tag "modules_gallvp" tag "ltrfinder" - tag "gunzip/main" + tag "gunzip" test("actinidia_chinensis-genome_21_fasta_gz-success") { setup { run('GUNZIP') { - script "../../gunzip/main" + script "../../gunzip/main.nf" process { """ diff --git a/modules/gallvp/ltrharvest/meta.yml b/modules/gallvp/ltrharvest/meta.yml index 256b3ce5..18064183 100644 --- a/modules/gallvp/ltrharvest/meta.yml +++ b/modules/gallvp/ltrharvest/meta.yml @@ -1,4 +1,3 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "ltrharvest" description: | @@ -18,6 +17,7 @@ tools: documentation: "https://github.com/oushujun/EDTA/tree/v2.2.0/bin/LTR_HARVEST_parallel" tool_dev_url: "https://github.com/oushujun/EDTA/tree/v2.2.0/bin/LTR_HARVEST_parallel" licence: ["MIT"] + identifier: "" - "gt": description: "The GenomeTools genome analysis system" homepage: "https://genometools.org/index.html" @@ -25,34 +25,43 @@ tools: tool_dev_url: "https://github.com/genometools/genometools" doi: "10.1109/TCBB.2013.68" licence: ["ISC"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - fasta: - type: file - description: Input genome fasta - pattern: "*.{fsa,fa,fasta}" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - fasta: + type: file + description: Input genome fasta + pattern: "*.{fsa,fa,fasta}" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - versions: - type: file - description: File containing software versions - pattern: "versions.yml" - gff3: - type: file - description: Predicted LTR candidates in gff3 format - pattern: "*.gff3" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*.gff3": + type: file + description: Predicted LTR candidates in gff3 format + pattern: "*.gff3" - scn: - type: file - description: Predicted LTR candidates in scn format - pattern: "*.scn" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*.scn": + type: file + description: Predicted LTR candidates in scn format + pattern: "*.scn" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/gallvp/ltrretriever/lai/meta.yml b/modules/gallvp/ltrretriever/lai/meta.yml index f84cf6ca..56efcccc 100644 --- a/modules/gallvp/ltrretriever/lai/meta.yml +++ b/modules/gallvp/ltrretriever/lai/meta.yml @@ -1,4 +1,3 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "ltrretriever_lai" description: | @@ -20,50 +19,59 @@ tools: tool_dev_url: "https://github.com/oushujun/LTR_retriever" doi: "10.1093/nar/gky730" licence: ["GPL v3"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - fasta: - type: file - description: The genome file that is used to generate everything - pattern: "*.{fsa,fa,fasta}" - - pass_list: - type: file - description: A list of intact LTR-RTs generated by LTR_retriever - pattern: "*.pass.list" - - annotation_out: - type: file - description: RepeatMasker annotation of all LTR sequences in the genome - pattern: "*.out" - - monoploid_seqs: - type: file - description: | - This parameter is mainly for ployploid genomes. User provides a list of - sequence names that represent a monoploid (1x). LAI will be calculated only - on these sequences if provided. - pattern: "*.txt" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - fasta: + type: file + description: The genome file that is used to generate everything + pattern: "*.{fsa,fa,fasta}" + - - pass_list: + type: file + description: A list of intact LTR-RTs generated by LTR_retriever + pattern: "*.pass.list" + - - annotation_out: + type: file + description: RepeatMasker annotation of all LTR sequences in the genome + pattern: "*.out" + - - monoploid_seqs: + type: file + description: | + This parameter is mainly for ployploid genomes. User provides a list of + sequence names that represent a monoploid (1x). LAI will be calculated only + on these sequences if provided. + pattern: "*.txt" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1', single_end:false ]` - log: - type: file - description: Log from LAI - pattern: "*.LAI.log" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1', single_end:false ]` + - "*.LAI.log": + type: file + description: Log from LAI + pattern: "*.LAI.log" - lai_out: - type: file - description: | - Output file from LAI if LAI is able to estimate the index from the inputs - pattern: "*.LAI.out" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1', single_end:false ]` + - "*.LAI.out": + type: file + description: | + Output file from LAI if LAI is able to estimate the index from the inputs + pattern: "*.LAI.out" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/gallvp/ltrretriever/ltrretriever/meta.yml b/modules/gallvp/ltrretriever/ltrretriever/meta.yml index a310b04a..9645de2d 100644 --- a/modules/gallvp/ltrretriever/ltrretriever/meta.yml +++ b/modules/gallvp/ltrretriever/ltrretriever/meta.yml @@ -1,4 +1,3 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "ltrretriever_ltrretriever" description: Identifies LTR retrotransposons using LTR_retriever @@ -16,67 +15,104 @@ tools: tool_dev_url: "https://github.com/oushujun/LTR_retriever" doi: "10.1104/pp.17.01310" licence: ["GPL v3"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - genome: - type: file - description: Genomic sequences in fasta format - pattern: "*.{fsa,fa,fasta}" - - harvest: - type: file - description: LTR-RT candidates from GenomeTools ltrharvest in the old tabular format - pattern: "*.tabout" - - finder: - type: file - description: LTR-RT candidates from LTR_FINDER - pattern: "*.scn" - - mgescan: - type: file - description: LTR-RT candidates from MGEScan_LTR - pattern: "*.out" - - non_tgca: - type: file - description: Non-canonical LTR-RT candidates from GenomeTools ltrharvest in the old tabular format - pattern: "*.tabout" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - genome: + type: file + description: Genomic sequences in fasta format + pattern: "*.{fsa,fa,fasta}" + - - harvest: + type: file + description: LTR-RT candidates from GenomeTools ltrharvest in the old tabular + format + pattern: "*.tabout" + - - finder: + type: file + description: LTR-RT candidates from LTR_FINDER + pattern: "*.scn" + - - mgescan: + type: file + description: LTR-RT candidates from MGEScan_LTR + pattern: "*.out" + - - non_tgca: + type: file + description: Non-canonical LTR-RT candidates from GenomeTools ltrharvest in + the old tabular format + pattern: "*.tabout" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - log: - type: file - description: Output log from LTR_retriever - pattern: "*.log" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*.log": + type: file + description: Output log from LTR_retriever + pattern: "*.log" - pass_list: - type: file - description: Intact LTR-RTs with coordinate and structural information in summary table format - pattern: "*.pass.list" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - ${prefix}.pass.list: + type: file + description: Intact LTR-RTs with coordinate and structural information in summary + table format + pattern: "*.pass.list" - pass_list_gff: - type: file - description: Intact LTR-RTs with coordinate and structural information in gff3 format - pattern: "*.pass.list.gff3" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*.pass.list.gff3": + type: file + description: Intact LTR-RTs with coordinate and structural information in gff3 + format + pattern: "*.pass.list.gff3" - ltrlib: - type: file - description: All non-redundant LTR-RTs - pattern: "*.LTRlib.fa" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*.LTRlib.fa": + type: file + description: All non-redundant LTR-RTs + pattern: "*.LTRlib.fa" - annotation_out: - type: file - description: Whole-genome LTR-RT annotation by the non-redundant library - pattern: "*.out" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - ${prefix}.out: + type: file + description: Whole-genome LTR-RT annotation by the non-redundant library + pattern: "*.out" - annotation_gff: - type: file - description: Whole-genome LTR-RT annotation by the non-redundant library in gff3 format - pattern: "*.out.gff3" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*.out.gff3": + type: file + description: Whole-genome LTR-RT annotation by the non-redundant library in + gff3 format + pattern: "*.out.gff3" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" - + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/gallvp/ltrretriever/ltrretriever/tests/main.nf.test b/modules/gallvp/ltrretriever/ltrretriever/tests/main.nf.test index 91b1a104..52dd7ac0 100644 --- a/modules/gallvp/ltrretriever/ltrretriever/tests/main.nf.test +++ b/modules/gallvp/ltrretriever/ltrretriever/tests/main.nf.test @@ -9,7 +9,7 @@ nextflow_process { tag "modules_gallvp" tag "ltrretriever" tag "ltrretriever/ltrretriever" - tag "gunzip/main" + tag "gunzip" tag "ltrharvest" tag "ltrfinder" tag "cat/cat" @@ -19,7 +19,7 @@ nextflow_process { setup { run("LTRHARVEST") { - script "../../../ltrharvest" + script "../../../ltrharvest/main.nf" process { """ @@ -32,7 +32,7 @@ nextflow_process { } run("LTRFINDER") { - script "../../../ltrfinder" + script "../../../ltrfinder/main.nf" process { """ @@ -45,7 +45,7 @@ nextflow_process { } run("CAT_CAT") { - script "../../../cat/cat" + script "../../../cat/cat/main.nf" process { """ @@ -85,7 +85,7 @@ nextflow_process { setup { run('GUNZIP') { - script "../../../gunzip/main" + script "../../../gunzip/main.nf" process { """ @@ -98,7 +98,7 @@ nextflow_process { } run("LTRHARVEST") { - script "../../../ltrharvest" + script "../../../ltrharvest/main.nf" process { """ @@ -108,7 +108,7 @@ nextflow_process { } run("LTRFINDER") { - script "../../../ltrfinder" + script "../../../ltrfinder/main.nf" process { """ @@ -118,7 +118,7 @@ nextflow_process { } run("CAT_CAT") { - script "../../../cat/cat" + script "../../../cat/cat/main.nf" process { """ diff --git a/modules/gallvp/minimap2/align/environment.yml b/modules/gallvp/minimap2/align/environment.yml new file mode 100644 index 00000000..dc6476b7 --- /dev/null +++ b/modules/gallvp/minimap2/align/environment.yml @@ -0,0 +1,8 @@ +channels: + - conda-forge + - bioconda + +dependencies: + - bioconda::htslib=1.20 + - bioconda::minimap2=2.28 + - bioconda::samtools=1.20 diff --git a/modules/gallvp/minimap2/align/main.nf b/modules/gallvp/minimap2/align/main.nf new file mode 100644 index 00000000..d82dc14d --- /dev/null +++ b/modules/gallvp/minimap2/align/main.nf @@ -0,0 +1,78 @@ +process MINIMAP2_ALIGN { + tag "$meta.id" + label 'process_high' + + // Note: the versions here need to match the versions used in the mulled container below and minimap2/index + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/mulled-v2-66534bcbb7031a148b13e2ad42583020b9cd25c4:3161f532a5ea6f1dec9be5667c9efc2afdac6104-0' : + 'biocontainers/mulled-v2-66534bcbb7031a148b13e2ad42583020b9cd25c4:3161f532a5ea6f1dec9be5667c9efc2afdac6104-0' }" + + input: + tuple val(meta), path(reads) + tuple val(meta2), path(reference) + val bam_format + val bam_index_extension + val cigar_paf_format + val cigar_bam + + output: + tuple val(meta), path("*.paf") , optional: true, emit: paf + tuple val(meta), path("*.bam") , optional: true, emit: bam + tuple val(meta), path("*.bam.${bam_index_extension}"), optional: true, emit: index + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def args2 = task.ext.args2 ?: '' + def args3 = task.ext.args3 ?: '' + def args4 = task.ext.args4 ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def bam_index = bam_index_extension ? "${prefix}.bam##idx##${prefix}.bam.${bam_index_extension} --write-index" : "${prefix}.bam" + def bam_output = bam_format ? "-a | samtools sort -@ ${task.cpus-1} -o ${bam_index} ${args2}" : "-o ${prefix}.paf" + def cigar_paf = cigar_paf_format && !bam_format ? "-c" : '' + def set_cigar_bam = cigar_bam && bam_format ? "-L" : '' + def bam_input = "${reads.extension}".matches('sam|bam|cram') + def samtools_reset_fastq = bam_input ? "samtools reset --threads ${task.cpus-1} $args3 $reads | samtools fastq --threads ${task.cpus-1} $args4 |" : '' + def query = bam_input ? "-" : reads + def target = reference ?: (bam_input ? error("BAM input requires reference") : reads) + + """ + $samtools_reset_fastq \\ + minimap2 \\ + $args \\ + -t $task.cpus \\ + $target \\ + $query \\ + $cigar_paf \\ + $set_cigar_bam \\ + $bam_output + + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + minimap2: \$(minimap2 --version 2>&1) + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + def output_file = bam_format ? "${prefix}.bam" : "${prefix}.paf" + def bam_index = bam_index_extension ? "touch ${prefix}.bam.${bam_index_extension}" : "" + def bam_input = "${reads.extension}".matches('sam|bam|cram') + def target = reference ?: (bam_input ? error("BAM input requires reference") : reads) + + """ + touch $output_file + ${bam_index} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + minimap2: \$(minimap2 --version 2>&1) + END_VERSIONS + """ +} diff --git a/modules/gallvp/minimap2/align/meta.yml b/modules/gallvp/minimap2/align/meta.yml new file mode 100644 index 00000000..a4cfc891 --- /dev/null +++ b/modules/gallvp/minimap2/align/meta.yml @@ -0,0 +1,99 @@ +name: minimap2_align +description: A versatile pairwise aligner for genomic and spliced nucleotide sequences +keywords: + - align + - fasta + - fastq + - genome + - paf + - reference +tools: + - minimap2: + description: | + A versatile pairwise aligner for genomic and spliced nucleotide sequences. + homepage: https://github.com/lh3/minimap2 + documentation: https://github.com/lh3/minimap2#uguide + licence: ["MIT"] + identifier: "" +input: + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FASTA or FASTQ files of size 1 and 2 for single-end + and paired-end data, respectively. + - - meta2: + type: map + description: | + Groovy Map containing reference information + e.g. [ id:'test_ref'] + - reference: + type: file + description: | + Reference database in FASTA format. + - - bam_format: + type: boolean + description: Specify that output should be in BAM format + - - bam_index_extension: + type: string + description: BAM alignment index extension (e.g. "bai") + - - cigar_paf_format: + type: boolean + description: Specify that output CIGAR should be in PAF format + - - cigar_bam: + type: boolean + description: | + Write CIGAR with >65535 ops at the CG tag. This is recommended when + doing XYZ (https://github.com/lh3/minimap2#working-with-65535-cigar-operations) +output: + - paf: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.paf": + type: file + description: Alignment in PAF format + pattern: "*.paf" + - bam: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.bam": + type: file + description: Alignment in BAM format + pattern: "*.bam" + - index: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.bam.${bam_index_extension}": + type: file + description: BAM alignment index + pattern: "*.bam.*" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@heuermh" + - "@sofstam" + - "@sateeshperi" + - "@jfy133" + - "@fellen31" +maintainers: + - "@heuermh" + - "@sofstam" + - "@sateeshperi" + - "@jfy133" + - "@fellen31" diff --git a/modules/gallvp/minimap2/align/tests/main.nf.test b/modules/gallvp/minimap2/align/tests/main.nf.test new file mode 100644 index 00000000..17510cb7 --- /dev/null +++ b/modules/gallvp/minimap2/align/tests/main.nf.test @@ -0,0 +1,441 @@ +nextflow_process { + + name "Test Process MINIMAP2_ALIGN" + script "../main.nf" + process "MINIMAP2_ALIGN" + + tag "modules" + tag "modules_gallvp" + tag "minimap2" + tag "minimap2/align" + + test("sarscov2 - fastq, fasta, true, [], false, false") { + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) + ] + input[1] = [ + [ id:'test_ref' ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ] + input[2] = true + input[3] = [] + input[4] = false + input[5] = false + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + bam(process.out.bam[0][1]).getHeader(), + bam(process.out.bam[0][1]).getReadsMD5(), + process.out.versions + ).match() } + ) + } + + } + + test("sarscov2 - fastq, fasta, true, 'bai', false, false") { + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) + ] + input[1] = [ + [ id:'test_ref' ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ] + input[2] = true + input[3] = 'bai' + input[4] = false + input[5] = false + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + bam(process.out.bam[0][1]).getHeader(), + bam(process.out.bam[0][1]).getReadsMD5(), + file(process.out.index[0][1]).name, + process.out.versions + ).match() } + ) + } + + } + + test("sarscov2 - [fastq1, fastq2], fasta, true, false, false") { + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:false ], // meta map + [ + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) + ] + ] + input[1] = [ + [ id:'test_ref' ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ] + input[2] = true + input[3] = [] + input[4] = false + input[5] = false + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + bam(process.out.bam[0][1]).getHeader(), + bam(process.out.bam[0][1]).getReadsMD5(), + process.out.versions + ).match() } + ) + } + + } + + test("sarscov2 - fastq, [], true, false, false") { + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + ] + input[1] = [ + [ id:'test_ref' ], // meta map + [] + ] + input[2] = true + input[3] = [] + input[4] = false + input[5] = false + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + bam(process.out.bam[0][1]).getHeader(), + bam(process.out.bam[0][1]).getReadsMD5(), + process.out.versions + ).match() } + ) + } + + } + + test("sarscov2 - bam, fasta, true, [], false, false") { + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/bam/test3.single_end.markduplicates.sorted.bam', checkIfExists: true) + ] + input[1] = [ + [ id:'test_ref' ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ] + input[2] = true + input[3] = [] + input[4] = false + input[5] = false + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + bam(process.out.bam[0][1]).getHeader(), + bam(process.out.bam[0][1]).getReadsMD5(), + process.out.versions + ).match() } + ) + } + + } + + test("sarscov2 - bam, fasta, true, 'bai', false, false") { + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/bam/test3.single_end.markduplicates.sorted.bam', checkIfExists: true) + ] + input[1] = [ + [ id:'test_ref' ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ] + input[2] = true + input[3] = 'bai' + input[4] = false + input[5] = false + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + bam(process.out.bam[0][1]).getHeader(), + bam(process.out.bam[0][1]).getReadsMD5(), + file(process.out.index[0][1]).name, + process.out.versions + ).match() } + ) + } + + } + + test("sarscov2 - bam, [], true, false, false") { + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/bam/test3.single_end.markduplicates.sorted.bam', checkIfExists: true) + ] + input[1] = [ + [ id:'test_ref' ], // meta map + [] + ] + input[2] = true + input[3] = [] + input[4] = false + input[5] = false + """ + } + } + + then { + assertAll( + { assert process.failed } + ) + } + + } + + test("sarscov2 - fastq, fasta, true, [], false, false - stub") { + + options "-stub" + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) + ] + input[1] = [ + [ id:'test_ref' ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ] + input[2] = true + input[3] = [] + input[4] = false + input[5] = false + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + + test("sarscov2 - fastq, fasta, true, 'bai', false, false - stub") { + + options "-stub" + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) + ] + input[1] = [ + [ id:'test_ref' ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ] + input[2] = true + input[3] = 'bai' + input[4] = false + input[5] = false + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + + test("sarscov2 - fastq, fasta, false, [], false, false - stub") { + + options "-stub" + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) + ] + input[1] = [ + [ id:'test_ref' ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ] + input[2] = false + input[3] = [] + input[4] = false + input[5] = false + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + + test("sarscov2 - bam, fasta, true, [], false, false - stub") { + + options "-stub" + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/bam/test3.single_end.markduplicates.sorted.bam', checkIfExists: true) + ] + input[1] = [ + [ id:'test_ref' ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ] + input[2] = true + input[3] = [] + input[4] = false + input[5] = false + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + + test("sarscov2 - bam, fasta, true, 'bai', false, false - stub") { + + options "-stub" + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/bam/test3.single_end.markduplicates.sorted.bam', checkIfExists: true) + ] + input[1] = [ + [ id:'test_ref' ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ] + input[2] = true + input[3] = 'bai' + input[4] = false + input[5] = false + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + + test("sarscov2 - bam, [], true, false, false - stub") { + + options "-stub" + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/bam/test3.single_end.markduplicates.sorted.bam', checkIfExists: true) + ] + input[1] = [ + [ id:'test_ref' ], // meta map + [] + ] + input[2] = true + input[3] = [] + input[4] = false + input[5] = false + """ + } + } + + then { + assertAll( + { assert process.failed } + ) + } + + } + +} \ No newline at end of file diff --git a/modules/gallvp/minimap2/align/tests/main.nf.test.snap b/modules/gallvp/minimap2/align/tests/main.nf.test.snap new file mode 100644 index 00000000..bafaabd7 --- /dev/null +++ b/modules/gallvp/minimap2/align/tests/main.nf.test.snap @@ -0,0 +1,476 @@ +{ + "sarscov2 - bam, fasta, true, 'bai', false, false": { + "content": [ + [ + "@HD\tVN:1.6\tSO:coordinate", + "@SQ\tSN:MT192765.1\tLN:29829", + "@PG\tID:minimap2\tPN:minimap2\tVN:2.28-r1209\tCL:minimap2 -t 4 -a genome.fasta -", + "@PG\tID:samtools\tPN:samtools\tPP:minimap2\tVN:1.20\tCL:samtools sort -@ 3 -o test.bam##idx##test.bam.bai --write-index" + ], + "5d426b9a5f5b2c54f1d7f1e4c238ae94", + "test.bam.bai", + [ + "versions.yml:md5,3548eeba9066efbf8d78ea99f8d813fd" + ] + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-10T20:47:38.40726" + }, + "sarscov2 - bam, fasta, true, 'bai', false, false - stub": { + "content": [ + { + "0": [ + + ], + "1": [ + [ + { + "id": "test", + "single_end": true + }, + "test.bam:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + [ + { + "id": "test", + "single_end": true + }, + "test.bam.bai:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "3": [ + "versions.yml:md5,98b8f5f36aa54b82210094f0b0d11938" + ], + "bam": [ + [ + { + "id": "test", + "single_end": true + }, + "test.bam:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "index": [ + [ + { + "id": "test", + "single_end": true + }, + "test.bam.bai:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "paf": [ + + ], + "versions": [ + "versions.yml:md5,98b8f5f36aa54b82210094f0b0d11938" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.2" + }, + "timestamp": "2024-07-23T11:21:37.92353539" + }, + "sarscov2 - fastq, fasta, true, 'bai', false, false - stub": { + "content": [ + { + "0": [ + + ], + "1": [ + [ + { + "id": "test", + "single_end": true + }, + "test.bam:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + [ + { + "id": "test", + "single_end": true + }, + "test.bam.bai:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "3": [ + "versions.yml:md5,98b8f5f36aa54b82210094f0b0d11938" + ], + "bam": [ + [ + { + "id": "test", + "single_end": true + }, + "test.bam:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "index": [ + [ + { + "id": "test", + "single_end": true + }, + "test.bam.bai:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "paf": [ + + ], + "versions": [ + "versions.yml:md5,98b8f5f36aa54b82210094f0b0d11938" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.2" + }, + "timestamp": "2024-06-03T11:29:44.669021368" + }, + "sarscov2 - fastq, fasta, false, [], false, false - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": true + }, + "test.paf:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + + ], + "2": [ + + ], + "3": [ + "versions.yml:md5,98b8f5f36aa54b82210094f0b0d11938" + ], + "bam": [ + + ], + "index": [ + + ], + "paf": [ + [ + { + "id": "test", + "single_end": true + }, + "test.paf:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,98b8f5f36aa54b82210094f0b0d11938" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.2" + }, + "timestamp": "2024-06-03T11:15:52.738781039" + }, + "sarscov2 - fastq, fasta, true, [], false, false - stub": { + "content": [ + { + "0": [ + + ], + "1": [ + [ + { + "id": "test", + "single_end": true + }, + "test.bam:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + + ], + "3": [ + "versions.yml:md5,98b8f5f36aa54b82210094f0b0d11938" + ], + "bam": [ + [ + { + "id": "test", + "single_end": true + }, + "test.bam:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "index": [ + + ], + "paf": [ + + ], + "versions": [ + "versions.yml:md5,98b8f5f36aa54b82210094f0b0d11938" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.2" + }, + "timestamp": "2024-06-03T11:15:23.033808223" + }, + "sarscov2 - [fastq1, fastq2], fasta, true, false, false": { + "content": [ + [ + "@HD\tVN:1.6\tSO:coordinate", + "@SQ\tSN:MT192765.1\tLN:29829", + "@PG\tID:minimap2\tPN:minimap2\tVN:2.28-r1209\tCL:minimap2 -t 4 -a genome.fasta test_1.fastq.gz test_2.fastq.gz", + "@PG\tID:samtools\tPN:samtools\tPP:minimap2\tVN:1.20\tCL:samtools sort -@ 3 -o test.bam" + ], + "1bc392244f228bf52cf0b5a8f6a654c9", + [ + "versions.yml:md5,3548eeba9066efbf8d78ea99f8d813fd" + ] + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-10T20:47:04.155509" + }, + "sarscov2 - fastq, fasta, true, [], false, false": { + "content": [ + [ + "@HD\tVN:1.6\tSO:coordinate", + "@SQ\tSN:MT192765.1\tLN:29829", + "@PG\tID:minimap2\tPN:minimap2\tVN:2.28-r1209\tCL:minimap2 -t 4 -a genome.fasta test_1.fastq.gz", + "@PG\tID:samtools\tPN:samtools\tPP:minimap2\tVN:1.20\tCL:samtools sort -@ 3 -o test.bam" + ], + "f194745c0ccfcb2a9c0aee094a08750", + [ + "versions.yml:md5,3548eeba9066efbf8d78ea99f8d813fd" + ] + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-10T20:46:42.073683" + }, + "sarscov2 - fastq, fasta, true, 'bai', false, false": { + "content": [ + [ + "@HD\tVN:1.6\tSO:coordinate", + "@SQ\tSN:MT192765.1\tLN:29829", + "@PG\tID:minimap2\tPN:minimap2\tVN:2.28-r1209\tCL:minimap2 -t 4 -a genome.fasta test_1.fastq.gz", + "@PG\tID:samtools\tPN:samtools\tPP:minimap2\tVN:1.20\tCL:samtools sort -@ 3 -o test.bam##idx##test.bam.bai --write-index" + ], + "f194745c0ccfcb2a9c0aee094a08750", + "test.bam.bai", + [ + "versions.yml:md5,3548eeba9066efbf8d78ea99f8d813fd" + ] + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-10T20:46:53.814566" + }, + "sarscov2 - bam, fasta, true, [], false, false": { + "content": [ + [ + "@HD\tVN:1.6\tSO:coordinate", + "@SQ\tSN:MT192765.1\tLN:29829", + "@PG\tID:minimap2\tPN:minimap2\tVN:2.28-r1209\tCL:minimap2 -t 4 -a genome.fasta -", + "@PG\tID:samtools\tPN:samtools\tPP:minimap2\tVN:1.20\tCL:samtools sort -@ 3 -o test.bam" + ], + "5d426b9a5f5b2c54f1d7f1e4c238ae94", + [ + "versions.yml:md5,3548eeba9066efbf8d78ea99f8d813fd" + ] + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-10T20:47:26.993111" + }, + "sarscov2 - bam, fasta, true, [], false, false - stub": { + "content": [ + { + "0": [ + + ], + "1": [ + [ + { + "id": "test", + "single_end": true + }, + "test.bam:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + + ], + "3": [ + "versions.yml:md5,98b8f5f36aa54b82210094f0b0d11938" + ], + "bam": [ + [ + { + "id": "test", + "single_end": true + }, + "test.bam:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "index": [ + + ], + "paf": [ + + ], + "versions": [ + "versions.yml:md5,98b8f5f36aa54b82210094f0b0d11938" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.2" + }, + "timestamp": "2024-07-23T11:21:22.162291795" + }, + "sarscov2 - fastq, [], true, false, false": { + "content": [ + [ + "@HD\tVN:1.6\tSO:coordinate", + "@SQ\tSN:ERR5069949.2151832\tLN:150", + "@SQ\tSN:ERR5069949.576388\tLN:77", + "@SQ\tSN:ERR5069949.501486\tLN:146", + "@SQ\tSN:ERR5069949.1331889\tLN:132", + "@SQ\tSN:ERR5069949.2161340\tLN:80", + "@SQ\tSN:ERR5069949.973930\tLN:79", + "@SQ\tSN:ERR5069949.2417063\tLN:150", + "@SQ\tSN:ERR5069949.376959\tLN:151", + "@SQ\tSN:ERR5069949.1088785\tLN:149", + "@SQ\tSN:ERR5069949.1066259\tLN:147", + "@SQ\tSN:ERR5069949.2832676\tLN:139", + "@SQ\tSN:ERR5069949.2953930\tLN:151", + "@SQ\tSN:ERR5069949.324865\tLN:151", + "@SQ\tSN:ERR5069949.2185111\tLN:150", + "@SQ\tSN:ERR5069949.937422\tLN:151", + "@SQ\tSN:ERR5069949.2431709\tLN:150", + "@SQ\tSN:ERR5069949.1246538\tLN:148", + "@SQ\tSN:ERR5069949.1189252\tLN:98", + "@SQ\tSN:ERR5069949.2216307\tLN:147", + "@SQ\tSN:ERR5069949.3273002\tLN:148", + "@SQ\tSN:ERR5069949.3277445\tLN:151", + "@SQ\tSN:ERR5069949.3022231\tLN:147", + "@SQ\tSN:ERR5069949.184542\tLN:151", + "@SQ\tSN:ERR5069949.540529\tLN:149", + "@SQ\tSN:ERR5069949.686090\tLN:150", + "@SQ\tSN:ERR5069949.2787556\tLN:106", + "@SQ\tSN:ERR5069949.2650879\tLN:150", + "@SQ\tSN:ERR5069949.2064910\tLN:149", + "@SQ\tSN:ERR5069949.2328704\tLN:150", + "@SQ\tSN:ERR5069949.1067032\tLN:150", + "@SQ\tSN:ERR5069949.3338256\tLN:151", + "@SQ\tSN:ERR5069949.1412839\tLN:147", + "@SQ\tSN:ERR5069949.1538968\tLN:150", + "@SQ\tSN:ERR5069949.147998\tLN:94", + "@SQ\tSN:ERR5069949.366975\tLN:106", + "@SQ\tSN:ERR5069949.1372331\tLN:151", + "@SQ\tSN:ERR5069949.1709367\tLN:129", + "@SQ\tSN:ERR5069949.2388984\tLN:150", + "@SQ\tSN:ERR5069949.1132353\tLN:150", + "@SQ\tSN:ERR5069949.1151736\tLN:151", + "@SQ\tSN:ERR5069949.479807\tLN:150", + "@SQ\tSN:ERR5069949.2176303\tLN:151", + "@SQ\tSN:ERR5069949.2772897\tLN:151", + "@SQ\tSN:ERR5069949.1020777\tLN:122", + "@SQ\tSN:ERR5069949.465452\tLN:151", + "@SQ\tSN:ERR5069949.1704586\tLN:149", + "@SQ\tSN:ERR5069949.1258508\tLN:151", + "@SQ\tSN:ERR5069949.986441\tLN:119", + "@SQ\tSN:ERR5069949.2674295\tLN:148", + "@SQ\tSN:ERR5069949.885966\tLN:79", + "@SQ\tSN:ERR5069949.2342766\tLN:151", + "@SQ\tSN:ERR5069949.3122970\tLN:127", + "@SQ\tSN:ERR5069949.3279513\tLN:72", + "@SQ\tSN:ERR5069949.309410\tLN:151", + "@SQ\tSN:ERR5069949.532979\tLN:149", + "@SQ\tSN:ERR5069949.2888794\tLN:151", + "@SQ\tSN:ERR5069949.2205229\tLN:150", + "@SQ\tSN:ERR5069949.786562\tLN:151", + "@SQ\tSN:ERR5069949.919671\tLN:151", + "@SQ\tSN:ERR5069949.1328186\tLN:151", + "@SQ\tSN:ERR5069949.870926\tLN:149", + "@SQ\tSN:ERR5069949.2257580\tLN:151", + "@SQ\tSN:ERR5069949.3249622\tLN:77", + "@SQ\tSN:ERR5069949.611123\tLN:125", + "@SQ\tSN:ERR5069949.651338\tLN:142", + "@SQ\tSN:ERR5069949.169513\tLN:92", + "@SQ\tSN:ERR5069949.155944\tLN:150", + "@SQ\tSN:ERR5069949.2033605\tLN:150", + "@SQ\tSN:ERR5069949.2730382\tLN:142", + "@SQ\tSN:ERR5069949.2125592\tLN:150", + "@SQ\tSN:ERR5069949.1062611\tLN:151", + "@SQ\tSN:ERR5069949.1778133\tLN:151", + "@SQ\tSN:ERR5069949.3057020\tLN:95", + "@SQ\tSN:ERR5069949.2972968\tLN:141", + "@SQ\tSN:ERR5069949.2734474\tLN:149", + "@SQ\tSN:ERR5069949.856527\tLN:151", + "@SQ\tSN:ERR5069949.2098070\tLN:151", + "@SQ\tSN:ERR5069949.1552198\tLN:150", + "@SQ\tSN:ERR5069949.2385514\tLN:150", + "@SQ\tSN:ERR5069949.2270078\tLN:151", + "@SQ\tSN:ERR5069949.114870\tLN:150", + "@SQ\tSN:ERR5069949.2668880\tLN:147", + "@SQ\tSN:ERR5069949.257821\tLN:139", + "@SQ\tSN:ERR5069949.2243023\tLN:150", + "@SQ\tSN:ERR5069949.2605155\tLN:146", + "@SQ\tSN:ERR5069949.1340552\tLN:151", + "@SQ\tSN:ERR5069949.1561137\tLN:150", + "@SQ\tSN:ERR5069949.2361683\tLN:149", + "@SQ\tSN:ERR5069949.2521353\tLN:150", + "@SQ\tSN:ERR5069949.1261808\tLN:149", + "@SQ\tSN:ERR5069949.2734873\tLN:98", + "@SQ\tSN:ERR5069949.3017828\tLN:107", + "@SQ\tSN:ERR5069949.573706\tLN:150", + "@SQ\tSN:ERR5069949.1980512\tLN:151", + "@SQ\tSN:ERR5069949.1014693\tLN:150", + "@SQ\tSN:ERR5069949.3184655\tLN:150", + "@SQ\tSN:ERR5069949.29668\tLN:89", + "@SQ\tSN:ERR5069949.3258358\tLN:151", + "@SQ\tSN:ERR5069949.1476386\tLN:151", + "@SQ\tSN:ERR5069949.2415814\tLN:150", + "@PG\tID:minimap2\tPN:minimap2\tVN:2.28-r1209\tCL:minimap2 -t 4 -a test_1.fastq.gz test_1.fastq.gz", + "@PG\tID:samtools\tPN:samtools\tPP:minimap2\tVN:1.20\tCL:samtools sort -@ 3 -o test.bam" + ], + "16c1c651f8ec67383bcdee3c55aed94f", + [ + "versions.yml:md5,3548eeba9066efbf8d78ea99f8d813fd" + ] + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-10T20:47:14.585958" + } +} \ No newline at end of file diff --git a/modules/gallvp/minimap2/align/tests/tags.yml b/modules/gallvp/minimap2/align/tests/tags.yml new file mode 100644 index 00000000..39dba374 --- /dev/null +++ b/modules/gallvp/minimap2/align/tests/tags.yml @@ -0,0 +1,2 @@ +minimap2/align: + - "modules/nf-core/minimap2/align/**" diff --git a/modules/gallvp/plotsr/meta.yml b/modules/gallvp/plotsr/meta.yml index c728b683..c2bc2993 100644 --- a/modules/gallvp/plotsr/meta.yml +++ b/modules/gallvp/plotsr/meta.yml @@ -1,7 +1,7 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "plotsr" -description: Plotsr generates high-quality visualisation of synteny and structural rearrangements between multiple genomes. +description: Plotsr generates high-quality visualisation of synteny and structural + rearrangements between multiple genomes. keywords: - genomics - synteny @@ -15,64 +15,67 @@ tools: tool_dev_url: "https://github.com/schneebergerlab/plotsr" doi: "10.1093/bioinformatics/btac196" licence: ["MIT"] + identifier: biotools:plotsr input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - syri: - type: file - description: Structural annotation mappings (syri.out) identified by SyRI - pattern: "*syri.out" - - fastas: - type: list - description: Fasta files in the sequence specified by the `genomes` file - pattern: "*.{fasta,fa,fsa,faa}" - - genomes: - type: string - description: | - String containing the genomes.txt file contents including the header, but excluding the path to genome fasta files. - The path to staged genome files is automatically added by the process script. For example, see the included nf-test. - pattern: "*.txt" - - bedpe: - type: file - description: Structural annotation mappings in BEDPE format - pattern: "*.bedpe" - - markers: - type: file - description: File containing path to markers - pattern: "*.bed" - - tracks: - type: file - description: File listing paths and details for all tracks to be plotted - pattern: "*.txt" - - chrord: - type: file - description: | - File containing reference (first genome) chromosome IDs in the order in which they are to be plotted. - File requires one chromosome ID per line. Not compatible with --chr - pattern: "*.txt" - - chrname: - type: file - description: | - File containing reference (first genome) chromosome names to be used in the plot. - File needs to be a TSV with the chromosome ID in first column and chromosome name in the second. - pattern: "*.txt" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - syri: + type: file + description: Structural annotation mappings (syri.out) identified by SyRI + pattern: "*syri.out" + - - fastas: + type: list + description: Fasta files in the sequence specified by the `genomes` file + pattern: "*.{fasta,fa,fsa,faa}" + - - genomes: + type: string + description: | + String containing the genomes.txt file contents including the header, but excluding the path to genome fasta files. + The path to staged genome files is automatically added by the process script. For example, see the included nf-test. + pattern: "*.txt" + - - bedpe: + type: file + description: Structural annotation mappings in BEDPE format + pattern: "*.bedpe" + - - markers: + type: file + description: File containing path to markers + pattern: "*.bed" + - - tracks: + type: file + description: File listing paths and details for all tracks to be plotted + pattern: "*.txt" + - - chrord: + type: file + description: | + File containing reference (first genome) chromosome IDs in the order in which they are to be plotted. + File requires one chromosome ID per line. Not compatible with --chr + pattern: "*.txt" + - - chrname: + type: file + description: | + File containing reference (first genome) chromosome names to be used in the plot. + File needs to be a TSV with the chromosome ID in first column and chromosome name in the second. + pattern: "*.txt" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - versions: - type: file - description: File containing software versions - pattern: "versions.yml" - png: - type: file - description: Synteny plot - pattern: "*.png" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*.png": + type: file + description: Synteny plot + pattern: "*.png" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/gallvp/samblaster/meta.yml b/modules/gallvp/samblaster/meta.yml index 5c1e5a97..5faf3a6c 100644 --- a/modules/gallvp/samblaster/meta.yml +++ b/modules/gallvp/samblaster/meta.yml @@ -23,30 +23,33 @@ tools: tool_dev_url: https://github.com/GregoryFaust/samblaster doi: "10.1093/bioinformatics/btu314" licence: ["MIT"] + identifier: biotools:samblaster input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - bam: - type: file - description: BAM file - pattern: "*.bam" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: BAM file + pattern: "*.bam" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - versions: - type: file - description: File containing software versions - pattern: "versions.yml" - bam: - type: file - description: Tagged or filtered BAM file - pattern: "*.bam" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.bam": + type: file + description: Tagged or filtered BAM file + pattern: "*.bam" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@lescai" maintainers: diff --git a/modules/gallvp/samtools/faidx/environment.yml b/modules/gallvp/samtools/faidx/environment.yml index b98cbb99..2bcd47ee 100644 --- a/modules/gallvp/samtools/faidx/environment.yml +++ b/modules/gallvp/samtools/faidx/environment.yml @@ -3,5 +3,5 @@ channels: - bioconda dependencies: - - bioconda::htslib=1.20 - - bioconda::samtools=1.20 + - bioconda::htslib=1.21 + - bioconda::samtools=1.21 diff --git a/modules/gallvp/samtools/faidx/main.nf b/modules/gallvp/samtools/faidx/main.nf index bdcdbc95..28c0a81c 100644 --- a/modules/gallvp/samtools/faidx/main.nf +++ b/modules/gallvp/samtools/faidx/main.nf @@ -4,8 +4,8 @@ process SAMTOOLS_FAIDX { conda "${moduleDir}/environment.yml" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/samtools:1.20--h50ea8bc_0' : - 'biocontainers/samtools:1.20--h50ea8bc_0' }" + 'https://depot.galaxyproject.org/singularity/samtools:1.21--h50ea8bc_0' : + 'biocontainers/samtools:1.21--h50ea8bc_0' }" input: tuple val(meta), path(fasta) diff --git a/modules/gallvp/samtools/faidx/meta.yml b/modules/gallvp/samtools/faidx/meta.yml index f3c25de2..6721b2cb 100644 --- a/modules/gallvp/samtools/faidx/meta.yml +++ b/modules/gallvp/samtools/faidx/meta.yml @@ -14,47 +14,62 @@ tools: documentation: http://www.htslib.org/doc/samtools.html doi: 10.1093/bioinformatics/btp352 licence: ["MIT"] + identifier: biotools:samtools input: - - meta: - type: map - description: | - Groovy Map containing reference information - e.g. [ id:'test' ] - - fasta: - type: file - description: FASTA file - pattern: "*.{fa,fasta}" - - meta2: - type: map - description: | - Groovy Map containing reference information - e.g. [ id:'test' ] - - fai: - type: file - description: FASTA index file - pattern: "*.{fai}" + - - meta: + type: map + description: | + Groovy Map containing reference information + e.g. [ id:'test' ] + - fasta: + type: file + description: FASTA file + pattern: "*.{fa,fasta}" + - - meta2: + type: map + description: | + Groovy Map containing reference information + e.g. [ id:'test' ] + - fai: + type: file + description: FASTA index file + pattern: "*.{fai}" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - fa: - type: file - description: FASTA file - pattern: "*.{fa}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.{fa,fasta}": + type: file + description: FASTA file + pattern: "*.{fa}" - fai: - type: file - description: FASTA index file - pattern: "*.{fai}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.fai": + type: file + description: FASTA index file + pattern: "*.{fai}" - gzi: - type: file - description: Optional gzip index file for compressed inputs - pattern: "*.gzi" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.gzi": + type: file + description: Optional gzip index file for compressed inputs + pattern: "*.gzi" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@drpatelh" - "@ewels" diff --git a/modules/gallvp/samtools/faidx/tests/main.nf.test.snap b/modules/gallvp/samtools/faidx/tests/main.nf.test.snap index 3223b72b..1bbb3ec2 100644 --- a/modules/gallvp/samtools/faidx/tests/main.nf.test.snap +++ b/modules/gallvp/samtools/faidx/tests/main.nf.test.snap @@ -18,7 +18,7 @@ ], "3": [ - "versions.yml:md5,2db78952923a61e05d50b95518b21856" + "versions.yml:md5,6bbe80a2e14bd61202ca63e12d66027f" ], "fa": [ @@ -36,15 +36,15 @@ ], "versions": [ - "versions.yml:md5,2db78952923a61e05d50b95518b21856" + "versions.yml:md5,6bbe80a2e14bd61202ca63e12d66027f" ] } ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-05-28T15:42:14.779784761" + "timestamp": "2024-09-16T07:57:47.450887871" }, "test_samtools_faidx_bgzip": { "content": [ @@ -71,7 +71,7 @@ ] ], "3": [ - "versions.yml:md5,2db78952923a61e05d50b95518b21856" + "versions.yml:md5,6bbe80a2e14bd61202ca63e12d66027f" ], "fa": [ @@ -95,15 +95,15 @@ ] ], "versions": [ - "versions.yml:md5,2db78952923a61e05d50b95518b21856" + "versions.yml:md5,6bbe80a2e14bd61202ca63e12d66027f" ] } ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-05-28T15:42:20.256633877" + "timestamp": "2024-09-16T07:58:04.804905659" }, "test_samtools_faidx_fasta": { "content": [ @@ -124,7 +124,7 @@ ], "3": [ - "versions.yml:md5,2db78952923a61e05d50b95518b21856" + "versions.yml:md5,6bbe80a2e14bd61202ca63e12d66027f" ], "fa": [ [ @@ -142,15 +142,15 @@ ], "versions": [ - "versions.yml:md5,2db78952923a61e05d50b95518b21856" + "versions.yml:md5,6bbe80a2e14bd61202ca63e12d66027f" ] } ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-05-28T15:42:25.632577273" + "timestamp": "2024-09-16T07:58:23.831268154" }, "test_samtools_faidx_stub_fasta": { "content": [ @@ -171,7 +171,7 @@ ], "3": [ - "versions.yml:md5,2db78952923a61e05d50b95518b21856" + "versions.yml:md5,6bbe80a2e14bd61202ca63e12d66027f" ], "fa": [ [ @@ -189,15 +189,15 @@ ], "versions": [ - "versions.yml:md5,2db78952923a61e05d50b95518b21856" + "versions.yml:md5,6bbe80a2e14bd61202ca63e12d66027f" ] } ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-05-28T15:42:31.058424849" + "timestamp": "2024-09-16T07:58:35.600243706" }, "test_samtools_faidx_stub_fai": { "content": [ @@ -218,7 +218,7 @@ ], "3": [ - "versions.yml:md5,2db78952923a61e05d50b95518b21856" + "versions.yml:md5,6bbe80a2e14bd61202ca63e12d66027f" ], "fa": [ @@ -236,14 +236,14 @@ ], "versions": [ - "versions.yml:md5,2db78952923a61e05d50b95518b21856" + "versions.yml:md5,6bbe80a2e14bd61202ca63e12d66027f" ] } ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-05-28T15:42:36.479929617" + "timestamp": "2024-09-16T07:58:54.705460167" } } \ No newline at end of file diff --git a/modules/gallvp/seqkit/seq/meta.yml b/modules/gallvp/seqkit/seq/meta.yml index 8d4e2b16..7d32aba5 100644 --- a/modules/gallvp/seqkit/seq/meta.yml +++ b/modules/gallvp/seqkit/seq/meta.yml @@ -1,7 +1,7 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "seqkit_seq" -description: Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...) +description: Transforms sequences (extract ID, filter by length, remove gaps, reverse + complement...) keywords: - genomics - fasta @@ -18,30 +18,33 @@ tools: tool_dev_url: "https://github.com/shenwei356/seqkit" doi: "10.1371/journal.pone.0163962" licence: ["MIT"] + identifier: biotools:seqkit input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - fastx: - type: file - description: Input fasta/fastq file - pattern: "*.{fsa,fas,fa,fasta,fastq,fq,fsa.gz,fas.gz,fa.gz,fasta.gz,fastq.gz,fq.gz}" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - fastx: + type: file + description: Input fasta/fastq file + pattern: "*.{fsa,fas,fa,fasta,fastq,fq,fsa.gz,fas.gz,fa.gz,fasta.gz,fastq.gz,fq.gz}" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - fastx: - type: file - description: Output fasta/fastq file - pattern: "*.{fasta,fasta.gz,fastq,fastq.gz}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - ${prefix}.*: + type: file + description: Output fasta/fastq file + pattern: "*.{fasta,fasta.gz,fastq,fastq.gz}" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/gallvp/syri/meta.yml b/modules/gallvp/syri/meta.yml index d7801532..9699bacd 100644 --- a/modules/gallvp/syri/meta.yml +++ b/modules/gallvp/syri/meta.yml @@ -1,7 +1,7 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "syri" -description: Syri compares alignments between two chromosome-level assemblies and identifies synteny and structural rearrangements. +description: Syri compares alignments between two chromosome-level assemblies and + identifies synteny and structural rearrangements. keywords: - genomics - synteny @@ -15,49 +15,57 @@ tools: tool_dev_url: "https://github.com/schneebergerlab/syri" doi: "10.1186/s13059-019-1911-0" licence: ["MIT License"] + identifier: biotools:SyRI input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - infile: - type: file - description: File containing alignment coordinates - pattern: "*.{table, sam, bam, paf}" - - query_fasta: - type: file - description: Query genome for the alignments - pattern: "*.fasta" - - reference_fasta: - type: file - description: Reference genome for the alignments - pattern: "*.fasta" - - file_type: - type: string - description: | - Input file type which is one of T: Table, S: SAM, B: BAM, P: PAF - + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - infile: + type: file + description: File containing alignment coordinates + pattern: "*.{table, sam, bam, paf}" + - - query_fasta: + type: file + description: Query genome for the alignments + pattern: "*.fasta" + - - reference_fasta: + type: file + description: Reference genome for the alignments + pattern: "*.fasta" + - - file_type: + type: string + description: | + Input file type which is one of T: Table, S: SAM, B: BAM, P: PAF output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - versions: - type: file - description: File containing software versions - pattern: "versions.yml" - syri: - type: file - description: Syri output file - pattern: "*syri.out" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*syri.out": + type: file + description: Syri output file + pattern: "*syri.out" - error: - type: file - description: | - Error log if syri fails. This error log enables the pipeline to detect if syri has failed due to one of its - known limitations and pass the information to the user in a user-friendly manner such as a HTML report - pattern: "*.error.log" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*.error.log": + type: file + description: | + Error log if syri fails. This error log enables the pipeline to detect if syri has failed due to one of its + known limitations and pass the information to the user in a user-friendly manner such as a HTML report + pattern: "*.error.log" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/local/createreport.nf b/modules/local/createreport.nf index fd53edc0..319d2b05 100644 --- a/modules/local/createreport.nf +++ b/modules/local/createreport.nf @@ -10,6 +10,7 @@ process CREATEREPORT { path ncbi_fcs_adaptor_reports , stageAs: 'ncbi_fcs_adaptor_reports/*' path fcs_gx_reports , stageAs: 'fcs_gx_reports/*' path assemblathon_stats , stageAs: 'assemblathon_stats/*' + path gfastats , stageAs: 'gfastats/*' path genometools_gt_stats , stageAs: 'genometools_gt_stat/*' path busco_outputs , stageAs: 'busco_outputs/*' path busco_gff_outputs , stageAs: 'busco_gff_outputs/*' @@ -19,6 +20,7 @@ process CREATEREPORT { path hic_outputs , stageAs: 'hic_outputs/*' path synteny_outputs , stageAs: 'synteny_outputs/*' path merqury_outputs , stageAs: 'merqury_outputs/*' + path orthofinder_outputs , stageAs: 'orthofinder_outputs/*' path versions val params_json val params_summary_json diff --git a/modules/local/generatekaryotype.nf b/modules/local/generatekaryotype.nf index e3771fe6..94cd4f64 100644 --- a/modules/local/generatekaryotype.nf +++ b/modules/local/generatekaryotype.nf @@ -35,8 +35,7 @@ process GENERATEKARYOTYPE { exit 0 fi - tmp_file=\$(mktemp) - printf '%s\\n' "\${ref_seqs[@]}" > "\$tmp_file" + printf '%s\\n' "\${ref_seqs[@]}" > ${target_on_ref}.${seq_tag}.tmp if [[ $seq_tag = "all" ]];then cat $target_seq_len > filtered.target.seq.len @@ -45,7 +44,7 @@ process GENERATEKARYOTYPE { fi cat filtered.target.seq.len | awk '{print \$1,\$2,"grey"}' OFS="\\t" > colored.filtered.target.seq.len - grep -w -f "\$tmp_file" $ref_seq_len > filtered.ref.seq.len + grep -w -f ${target_on_ref}.${seq_tag}.tmp $ref_seq_len > filtered.ref.seq.len cat filtered.ref.seq.len | awk '{print \$1,\$2,"black"}' OFS="\\t" > colored.filtered.ref.seq.len cat colored.filtered.ref.seq.len | sort -k1V > merged.seq.lengths @@ -67,8 +66,6 @@ process GENERATEKARYOTYPE { | sed '/^\$/d' \ | awk '{print "chr -",\$1,\$1,"0",\$2-1,\$3}' OFS="\\t" \ > karyotype_target.tsv - - rm "\$tmp_file" """ stub: diff --git a/modules/local/runassemblyvisualizer.nf b/modules/local/runassemblyvisualizer.nf index 5452dfd0..e4ca50a4 100644 --- a/modules/local/runassemblyvisualizer.nf +++ b/modules/local/runassemblyvisualizer.nf @@ -21,14 +21,10 @@ process RUNASSEMBLYVISUALIZER { assembly_tag=\$(echo $sample_id_on_tag | sed 's/.*\\.on\\.//g') file_name="${agp_assembly_file}" - cp -r /usr/src/3d-dna/ \\ - 3d-dna + mkdir user_home + export _JAVA_OPTIONS="-Djava.util.prefs.userRoot=user_prefs -Duser.home=user_home -Xms${avail_mem}g -Xmx${avail_mem}g" - sed -i \\ - 's/-Xms49152m -Xmx49152m/-Xms${avail_mem}g -Xmx${avail_mem}g/1' \\ - 3d-dna/visualize/juicebox_tools.sh - - 3d-dna/visualize/run-assembly-visualizer.sh \\ + /usr/src/3d-dna/visualize/run-assembly-visualizer.sh \\ -p false \\ $agp_assembly_file $sorted_links_txt_file @@ -42,7 +38,6 @@ process RUNASSEMBLYVISUALIZER { stub: if ( !task.memory ) { error '[RUNASSEMBLYVISUALIZER] Available memory not known. Specify process memory requirements to fix this.' } - def avail_mem = (task.memory.giga*0.8).intValue() """ assembly_tag=\$(echo $sample_id_on_tag | sed 's/.*\\.on\\.//g') touch "\${assembly_tag}.hic" diff --git a/modules/nf-core/custom/sratoolsncbisettings/meta.yml b/modules/nf-core/custom/sratoolsncbisettings/meta.yml index 46a6cd32..2938a35d 100644 --- a/modules/nf-core/custom/sratoolsncbisettings/meta.yml +++ b/modules/nf-core/custom/sratoolsncbisettings/meta.yml @@ -1,5 +1,6 @@ name: "custom_sratoolsncbisettings" -description: Test for the presence of suitable NCBI settings or create them on the fly. +description: Test for the presence of suitable NCBI settings or create them on the + fly. keywords: - NCBI - settings @@ -13,15 +14,18 @@ tools: documentation: https://github.com/ncbi/sra-tools/wiki tool_dev_url: https://github.com/ncbi/sra-tools licence: ["Public Domain"] + identifier: "" output: - - versions: - type: file - description: File containing software versions - pattern: "versions.yml" - ncbi_settings: - type: file - description: An NCBI user settings file. - pattern: "*.mkfg" + - "*.mkfg": + type: file + description: An NCBI user settings file. + pattern: "*.mkfg" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@Midnighter" maintainers: diff --git a/modules/nf-core/fastavalidator/meta.yml b/modules/nf-core/fastavalidator/meta.yml index c5c4371c..94198e62 100644 --- a/modules/nf-core/fastavalidator/meta.yml +++ b/modules/nf-core/fastavalidator/meta.yml @@ -1,4 +1,3 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "fastavalidator" description: | @@ -19,34 +18,43 @@ tools: tool_dev_url: "https://github.com/linsalrob/py_fasta_validator" doi: "10.5281/zenodo.5002710" licence: ["MIT"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing file information - e.g. [ id:'test' ] - - fasta: - type: file - description: Input fasta file - pattern: "*.fasta" + - - meta: + type: map + description: | + Groovy Map containing file information + e.g. [ id:'test' ] + - fasta: + type: file + description: Input fasta file + pattern: "*.fasta" output: - - meta: - type: map - description: | - Groovy Map containing file information - e.g. [ id:'test' ] - success_log: - type: file - description: Log file for successful validation - pattern: "*.success.log" + - meta: + type: map + description: | + Groovy Map containing file information + e.g. [ id:'test' ] + - "*.success.log": + type: file + description: Log file for successful validation + pattern: "*.success.log" - error_log: - type: file - description: Log file for failed validation - pattern: "*.error.log" + - meta: + type: map + description: | + Groovy Map containing file information + e.g. [ id:'test' ] + - "*.error.log": + type: file + description: Log file for failed validation + pattern: "*.error.log" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@gallvp" maintainers: diff --git a/modules/nf-core/fastp/meta.yml b/modules/nf-core/fastp/meta.yml index 8dfecc18..159404d0 100644 --- a/modules/nf-core/fastp/meta.yml +++ b/modules/nf-core/fastp/meta.yml @@ -11,66 +11,100 @@ tools: documentation: https://github.com/OpenGene/fastp doi: 10.1093/bioinformatics/bty560 licence: ["MIT"] + identifier: biotools:fastp input: - - meta: - type: map - description: | - Groovy Map containing sample information. Use 'single_end: true' to specify single ended or interleaved FASTQs. Use 'single_end: false' for paired-end reads. - e.g. [ id:'test', single_end:false ] - - reads: - type: file - description: | - List of input FastQ files of size 1 and 2 for single-end and paired-end data, - respectively. If you wish to run interleaved paired-end data, supply as single-end data - but with `--interleaved_in` in your `modules.conf`'s `ext.args` for the module. - - adapter_fasta: - type: file - description: File in FASTA format containing possible adapters to remove. - pattern: "*.{fasta,fna,fas,fa}" - - discard_trimmed_pass: - type: boolean - description: Specify true to not write any reads that pass trimming thresholds. | - This can be used to use fastp for the output report only. - - save_trimmed_fail: - type: boolean - description: Specify true to save files that failed to pass trimming thresholds ending in `*.fail.fastq.gz` - - save_merged: - type: boolean - description: Specify true to save all merged reads to a file ending in `*.merged.fastq.gz` + - - meta: + type: map + description: | + Groovy Map containing sample information. Use 'single_end: true' to specify single ended or interleaved FASTQs. Use 'single_end: false' for paired-end reads. + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. If you wish to run interleaved paired-end data, supply as single-end data + but with `--interleaved_in` in your `modules.conf`'s `ext.args` for the module. + - - adapter_fasta: + type: file + description: File in FASTA format containing possible adapters to remove. + pattern: "*.{fasta,fna,fas,fa}" + - - discard_trimmed_pass: + type: boolean + description: Specify true to not write any reads that pass trimming thresholds. + | This can be used to use fastp for the output report only. + - - save_trimmed_fail: + type: boolean + description: Specify true to save files that failed to pass trimming thresholds + ending in `*.fail.fastq.gz` + - - save_merged: + type: boolean + description: Specify true to save all merged reads to a file ending in `*.merged.fastq.gz` output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - reads: - type: file - description: The trimmed/modified/unmerged fastq reads - pattern: "*fastp.fastq.gz" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.fastp.fastq.gz": + type: file + description: The trimmed/modified/unmerged fastq reads + pattern: "*fastp.fastq.gz" - json: - type: file - description: Results in JSON format - pattern: "*.json" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.json": + type: file + description: Results in JSON format + pattern: "*.json" - html: - type: file - description: Results in HTML format - pattern: "*.html" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.html": + type: file + description: Results in HTML format + pattern: "*.html" - log: - type: file - description: fastq log file - pattern: "*.log" - - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.log": + type: file + description: fastq log file + pattern: "*.log" - reads_fail: - type: file - description: Reads the failed the preprocessing - pattern: "*fail.fastq.gz" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.fail.fastq.gz": + type: file + description: Reads the failed the preprocessing + pattern: "*fail.fastq.gz" - reads_merged: - type: file - description: Reads that were successfully merged - pattern: "*.{merged.fastq.gz}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.merged.fastq.gz": + type: file + description: Reads that were successfully merged + pattern: "*.{merged.fastq.gz}" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@drpatelh" - "@kevinmenden" diff --git a/modules/nf-core/fastp/tests/main.nf.test.snap b/modules/nf-core/fastp/tests/main.nf.test.snap index 54be7e45..5561ac62 100644 --- a/modules/nf-core/fastp/tests/main.nf.test.snap +++ b/modules/nf-core/fastp/tests/main.nf.test.snap @@ -96,7 +96,7 @@ "id": "test", "single_end": false }, - "test.fastp.json:md5,1e0f8e27e71728e2b63fc64086be95cd" + "test.fastp.json:md5,63273f642b5a4495ce7ccba2d4419edf" ] ], [ @@ -122,10 +122,10 @@ ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.2" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-05T13:43:28.665779" + "timestamp": "2024-10-10T20:59:33.270219" }, "test_fastp_paired_end_merged_adapterlist": { "content": [ @@ -135,7 +135,7 @@ "id": "test", "single_end": false }, - "test.fastp.json:md5,5914ca3f21ce162123a824e33e8564f6" + "test.fastp.json:md5,7412a2cca178cc149757b229d2730d24" ] ], [ @@ -167,10 +167,10 @@ ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.2" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-05T13:44:18.210375" + "timestamp": "2024-10-10T21:00:25.951101" }, "test_fastp_single_end_qc_only": { "content": [ @@ -180,7 +180,7 @@ "id": "test", "single_end": true }, - "test.fastp.json:md5,5cc5f01e449309e0e689ed6f51a2294a" + "test.fastp.json:md5,8117ddbdb1ae856821eb66f40b14ba05" ] ], [ @@ -206,10 +206,10 @@ ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.2" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-05T13:44:27.380974" + "timestamp": "2024-10-10T21:00:36.271913" }, "test_fastp_paired_end_trim_fail": { "content": [ @@ -247,7 +247,7 @@ "id": "test", "single_end": false }, - "test.fastp.json:md5,4c3268ddb50ea5b33125984776aa3519" + "test.fastp.json:md5,5b7664268c0537423ffaa162701dae70" ] ], [ @@ -255,10 +255,10 @@ ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.2" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-05T13:43:58.749589" + "timestamp": "2024-10-10T21:00:04.189799" }, "fastp - stub test_fastp_interleaved": { "content": [ @@ -708,7 +708,7 @@ "id": "test", "single_end": false }, - "test.fastp.json:md5,b712fd68ed0322f4bec49ff2a5237fcc" + "test.fastp.json:md5,defba10ab9bb3e4235a86d90a51a2e79" ] ], [ @@ -740,10 +740,10 @@ ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.2" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-05T13:44:08.68476" + "timestamp": "2024-10-10T21:00:13.75505" }, "test_fastp_paired_end - stub": { "content": [ @@ -860,7 +860,7 @@ "id": "test", "single_end": true }, - "test.fastp.json:md5,c852d7a6dba5819e4ac8d9673bedcacc" + "test.fastp.json:md5,898dafe80449209d752985698a8cded0" ] ], [ @@ -883,10 +883,10 @@ ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.2" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-05T13:43:18.834322" + "timestamp": "2024-10-10T20:59:23.552733" }, "test_fastp_single_end_trim_fail - stub": { "content": [ @@ -1145,7 +1145,7 @@ "id": "test", "single_end": true }, - "test.fastp.json:md5,b24e0624df5cc0b11cd5ba21b726fb22" + "test.fastp.json:md5,d09c3ec938c4a7bd04bd6398f42fa9ca" ] ], [ @@ -1153,10 +1153,10 @@ ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.2" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-05T13:43:38.910832" + "timestamp": "2024-10-10T20:59:41.725562" }, "test_fastp_single_end_trim_fail": { "content": [ @@ -1166,7 +1166,7 @@ "id": "test", "single_end": true }, - "test.fastp.json:md5,9a7ee180f000e8d00c7fb67f06293eb5" + "test.fastp.json:md5,4415a44b64ff54a7a450b0acec32bc46" ] ], [ @@ -1195,10 +1195,10 @@ ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.2" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-05T13:43:48.22378" + "timestamp": "2024-10-10T20:59:52.36865" }, "test_fastp_paired_end_qc_only": { "content": [ @@ -1208,7 +1208,7 @@ "id": "test", "single_end": false }, - "test.fastp.json:md5,623064a45912dac6f2b64e3f2e9901df" + "test.fastp.json:md5,6ad01e6f6a98daf02410273af0b8ecea" ] ], [ @@ -1234,10 +1234,10 @@ ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.2" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-05T13:44:36.334938" + "timestamp": "2024-10-10T21:00:46.609595" }, "test_fastp_paired_end_qc_only - stub": { "content": [ diff --git a/modules/nf-core/fastqc/meta.yml b/modules/nf-core/fastqc/meta.yml index ee5507e0..4827da7a 100644 --- a/modules/nf-core/fastqc/meta.yml +++ b/modules/nf-core/fastqc/meta.yml @@ -16,35 +16,44 @@ tools: homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/ licence: ["GPL-2.0-only"] + identifier: biotools:fastqc input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - reads: - type: file - description: | - List of input FastQ files of size 1 and 2 for single-end and paired-end data, - respectively. + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - html: - type: file - description: FastQC report - pattern: "*_{fastqc.html}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.html": + type: file + description: FastQC report + pattern: "*_{fastqc.html}" - zip: - type: file - description: FastQC report archive - pattern: "*_{fastqc.zip}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.zip": + type: file + description: FastQC report archive + pattern: "*_{fastqc.zip}" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@drpatelh" - "@grst" diff --git a/modules/nf-core/fcs/fcsadaptor/meta.yml b/modules/nf-core/fcs/fcsadaptor/meta.yml index 54fca1bb..83cae5b7 100644 --- a/modules/nf-core/fcs/fcsadaptor/meta.yml +++ b/modules/nf-core/fcs/fcsadaptor/meta.yml @@ -18,45 +18,72 @@ tools: documentation: "https://github.com/ncbi/fcs/wiki/FCS-adaptor" tool_dev_url: "https://github.com/ncbi/fcs" licence: ["United States Government Work"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - assembly: - type: file - description: assembly fasta file + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - assembly: + type: file + description: assembly fasta file output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - versions: - type: file - description: File containing software versions - pattern: "versions.yml" - cleaned_assembly: - type: file - description: Cleaned assembly in fasta format - pattern: "*.{cleaned_sequences.fa.gz}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.cleaned_sequences.fa.gz": + type: file + description: Cleaned assembly in fasta format + pattern: "*.{cleaned_sequences.fa.gz}" - adaptor_report: - type: file - description: Report of identified adaptors - pattern: "*.{fcs_adaptor_report.txt}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.fcs_adaptor_report.txt": + type: file + description: Report of identified adaptors + pattern: "*.{fcs_adaptor_report.txt}" - log: - type: file - description: Log file - pattern: "*.{fcs_adaptor.log}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.fcs_adaptor.log": + type: file + description: Log file + pattern: "*.{fcs_adaptor.log}" - pipeline_args: - type: file - description: Run arguments - pattern: "*.{pipeline_args.yaml}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.pipeline_args.yaml": + type: file + description: Run arguments + pattern: "*.{pipeline_args.yaml}" - skipped_trims: - type: file - description: Skipped trim information - pattern: "*.{skipped_trims.jsonl}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.skipped_trims.jsonl": + type: file + description: Skipped trim information + pattern: "*.{skipped_trims.jsonl}" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@d4straub" maintainers: diff --git a/modules/nf-core/gfastats/environment.yml b/modules/nf-core/gfastats/environment.yml new file mode 100644 index 00000000..b47bbdbb --- /dev/null +++ b/modules/nf-core/gfastats/environment.yml @@ -0,0 +1,5 @@ +channels: + - conda-forge + - bioconda +dependencies: + - bioconda::gfastats=1.3.6 diff --git a/modules/nf-core/gfastats/main.nf b/modules/nf-core/gfastats/main.nf new file mode 100644 index 00000000..8db239ad --- /dev/null +++ b/modules/nf-core/gfastats/main.nf @@ -0,0 +1,66 @@ +process GFASTATS { + tag "$meta.id" + label 'process_low' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/gfastats:1.3.6--hdcf5f25_3': + 'biocontainers/gfastats:1.3.6--hdcf5f25_3' }" + + input: + tuple val(meta), path(assembly) // input.[fasta|fastq|gfa][.gz] + val out_fmt // output format (fasta/fastq/gfa) + val genome_size // estimated genome size for NG* statistics (optional). + val target // target specific sequence by header, optionally with coordinates (optional). + path agpfile // -a --agp-to-path converts input agp to path and replaces existing paths. + path include_bed // -i --include-bed generates output on a subset list of headers or coordinates in 0-based bed format. + path exclude_bed // -e --exclude-bed opposite of --include-bed. They can be combined (no coordinates). + path instructions // -k --swiss-army-knife set of instructions provided as an ordered list. + + output: + tuple val(meta), path("*.assembly_summary"), emit: assembly_summary + tuple val(meta), path("*.${out_fmt}.gz") , emit: assembly + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def agp = agpfile ? "--agp-to-path $agp" : "" + def ibed = include_bed ? "--include-bed $include_bed" : "" + def ebed = exclude_bed ? "--exclude-bed $exclude_bed" : "" + def sak = instructions ? "--swiss-army-knife $instructions" : "" + """ + gfastats \\ + $args \\ + --threads $task.cpus \\ + $agp \\ + $ibed \\ + $ebed \\ + $sak \\ + --out-format ${prefix}.${out_fmt}.gz \\ + $assembly \\ + $genome_size \\ + $target \\ + > ${prefix}.assembly_summary + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + gfastats: \$( gfastats -v | sed '1!d;s/.*v//' ) + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + """ + touch ${prefix}.${out_fmt}.gz + touch ${prefix}.assembly_summary + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + gfastats: \$( gfastats -v | sed '1!d;s/.*v//' ) + END_VERSIONS + """ +} diff --git a/modules/nf-core/gfastats/meta.yml b/modules/nf-core/gfastats/meta.yml new file mode 100644 index 00000000..a6213433 --- /dev/null +++ b/modules/nf-core/gfastats/meta.yml @@ -0,0 +1,83 @@ +name: "gfastats" +description: | + A single fast and exhaustive tool for summary statistics and simultaneous *fa* + (fasta, fastq, gfa [.gz]) genome assembly file manipulation. +keywords: + - gfastats + - fasta + - genome assembly + - genome summary + - genome manipulation + - genome statistics +tools: + - "gfastats": + description: "The swiss army knife for genome assembly." + homepage: "https://github.com/vgl-hub/gfastats" + documentation: "https://github.com/vgl-hub/gfastats/tree/main/instructions" + tool_dev_url: "https://github.com/vgl-hub/gfastats" + doi: "10.1093/bioinformatics/btac460" + licence: ["MIT"] + identifier: biotools:gfastats +input: + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - assembly: + type: file + description: Draft assembly file + pattern: "*.{fasta,fastq,gfa}(.gz)?" + - - out_fmt: + type: string + description: Output format (fasta, fastq, gfa) + - - genome_size: + type: integer + description: estimated genome size (bp) for NG* statistics (optional). + - - target: + type: string + description: target specific sequence by header, optionally with coordinates + (optional). + - - agpfile: + type: file + description: converts input agp to path and replaces existing paths. + - - include_bed: + type: file + description: generates output on a subset list of headers or coordinates in + 0-based bed format. + - - exclude_bed: + type: file + description: opposite of --include-bed. They can be combined (no coordinates). + - - instructions: + type: file + description: set of instructions provided as an ordered list. +output: + - assembly_summary: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.assembly_summary": + type: file + description: Assembly summary statistics file + pattern: "*.assembly_summary" + - assembly: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.${out_fmt}.gz": + type: file + description: The assembly as modified by gfastats + pattern: "*.{fasta,fastq,gfa}.gz" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@mahesh-panchal" +maintainers: + - "@mahesh-panchal" diff --git a/modules/nf-core/gffread/environment.yml b/modules/nf-core/gffread/environment.yml new file mode 100644 index 00000000..ee239841 --- /dev/null +++ b/modules/nf-core/gffread/environment.yml @@ -0,0 +1,5 @@ +channels: + - conda-forge + - bioconda +dependencies: + - bioconda::gffread=0.12.7 diff --git a/modules/nf-core/gffread/main.nf b/modules/nf-core/gffread/main.nf new file mode 100644 index 00000000..da55cbab --- /dev/null +++ b/modules/nf-core/gffread/main.nf @@ -0,0 +1,60 @@ +process GFFREAD { + tag "$meta.id" + label 'process_low' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/gffread:0.12.7--hdcf5f25_4' : + 'biocontainers/gffread:0.12.7--hdcf5f25_4' }" + + input: + tuple val(meta), path(gff) + path fasta + + output: + tuple val(meta), path("*.gtf") , emit: gtf , optional: true + tuple val(meta), path("*.gff3") , emit: gffread_gff , optional: true + tuple val(meta), path("*.fasta"), emit: gffread_fasta , optional: true + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def extension = args.contains("-T") ? 'gtf' : ( ( ['-w', '-x', '-y' ].any { args.contains(it) } ) ? 'fasta' : 'gff3' ) + def fasta_arg = fasta ? "-g $fasta" : '' + def output_name = "${prefix}.${extension}" + def output = extension == "fasta" ? "$output_name" : "-o $output_name" + def args_sorted = args.replaceAll(/(.*)(-[wxy])(.*)/) { all, pre, param, post -> "$pre $post $param" }.trim() + // args_sorted = Move '-w', '-x', and '-y' to the end of the args string as gffread expects the file name after these parameters + if ( "$output_name" in [ "$gff", "$fasta" ] ) error "Input and output names are the same, use \"task.ext.prefix\" to disambiguate!" + """ + gffread \\ + $gff \\ + $fasta_arg \\ + $args_sorted \\ + $output + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + gffread: \$(gffread --version 2>&1) + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def extension = args.contains("-T") ? 'gtf' : ( ( ['-w', '-x', '-y' ].any { args.contains(it) } ) ? 'fasta' : 'gff3' ) + def output_name = "${prefix}.${extension}" + if ( "$output_name" in [ "$gff", "$fasta" ] ) error "Input and output names are the same, use \"task.ext.prefix\" to disambiguate!" + """ + touch $output_name + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + gffread: \$(gffread --version 2>&1) + END_VERSIONS + """ +} diff --git a/modules/nf-core/gffread/meta.yml b/modules/nf-core/gffread/meta.yml new file mode 100644 index 00000000..bebe7f57 --- /dev/null +++ b/modules/nf-core/gffread/meta.yml @@ -0,0 +1,75 @@ +name: gffread +description: Validate, filter, convert and perform various other operations on GFF + files +keywords: + - gff + - conversion + - validation +tools: + - gffread: + description: GFF/GTF utility providing format conversions, region filtering, FASTA + sequence extraction and more. + homepage: http://ccb.jhu.edu/software/stringtie/gff.shtml#gffread + documentation: http://ccb.jhu.edu/software/stringtie/gff.shtml#gffread + tool_dev_url: https://github.com/gpertea/gffread + doi: 10.12688/f1000research.23297.1 + licence: ["MIT"] + identifier: biotools:gffread +input: + - - meta: + type: map + description: | + Groovy Map containing meta data + e.g. [ id:'test' ] + - gff: + type: file + description: A reference file in either the GFF3, GFF2 or GTF format. + pattern: "*.{gff, gtf}" + - - fasta: + type: file + description: A multi-fasta file with the genomic sequences + pattern: "*.{fasta,fa,faa,fas,fsa}" +output: + - gtf: + - meta: + type: map + description: | + Groovy Map containing meta data + e.g. [ id:'test' ] + - "*.gtf": + type: file + description: GTF file resulting from the conversion of the GFF input file if + '-T' argument is present + pattern: "*.{gtf}" + - gffread_gff: + - meta: + type: map + description: | + Groovy Map containing meta data + e.g. [ id:'test' ] + - "*.gff3": + type: file + description: GFF3 file resulting from the conversion of the GFF input file if + '-T' argument is absent + pattern: "*.gff3" + - gffread_fasta: + - meta: + type: map + description: | + Groovy Map containing meta data + e.g. [ id:'test' ] + - "*.fasta": + type: file + description: Fasta file produced when either of '-w', '-x', '-y' parameters + is present + pattern: "*.fasta" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@edmundmiller" +maintainers: + - "@edmundmiller" + - "@gallvp" diff --git a/modules/nf-core/gffread/tests/main.nf.test b/modules/nf-core/gffread/tests/main.nf.test new file mode 100644 index 00000000..4cd13dcd --- /dev/null +++ b/modules/nf-core/gffread/tests/main.nf.test @@ -0,0 +1,223 @@ +nextflow_process { + + name "Test Process GFFREAD" + script "../main.nf" + process "GFFREAD" + + tag "gffread" + tag "modules_nfcore" + tag "modules" + + test("sarscov2-gff3-gtf") { + + config "./nextflow.config" + + when { + params { + outdir = "$outputDir" + } + process { + """ + input[0] = [ + [id: 'test'], + file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.gff3", checkIfExists: true) + ] + input[1] = [] + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() }, + { assert process.out.gffread_gff == [] }, + { assert process.out.gffread_fasta == [] } + ) + } + + } + + test("sarscov2-gff3-gtf-stub") { + + options '-stub' + config "./nextflow.config" + + when { + params { + outdir = "$outputDir" + } + process { + """ + input[0] = [ + [id: 'test'], + file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.gff3", checkIfExists: true) + ] + input[1] = [] + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() }, + { assert process.out.gffread_gff == [] }, + { assert process.out.gffread_fasta == [] } + ) + } + + } + + test("sarscov2-gff3-gff3") { + + config "./nextflow-gff3.config" + + when { + params { + outdir = "$outputDir" + } + process { + """ + input[0] = [ + [id: 'test'], + file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.gff3", checkIfExists: true) + ] + input[1] = [] + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() }, + { assert process.out.gtf == [] }, + { assert process.out.gffread_fasta == [] } + ) + } + + } + + test("sarscov2-gff3-gff3-stub") { + + options '-stub' + config "./nextflow-gff3.config" + + when { + params { + outdir = "$outputDir" + } + process { + """ + input[0] = [ + [id: 'test'], + file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.gff3", checkIfExists: true) + ] + input[1] = [] + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() }, + { assert process.out.gtf == [] }, + { assert process.out.gffread_fasta == [] } + ) + } + + } + + test("sarscov2-gff3-fasta") { + + config "./nextflow-fasta.config" + + when { + params { + outdir = "$outputDir" + } + process { + """ + input[0] = [ + [id: 'test'], + file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.gff3", checkIfExists: true) + ] + input[1] = file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.fasta", checkIfExists: true) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() }, + { assert process.out.gtf == [] }, + { assert process.out.gffread_gff == [] } + ) + } + + } + + test("sarscov2-gff3-fasta-stub") { + + options '-stub' + config "./nextflow-fasta.config" + + when { + params { + outdir = "$outputDir" + } + process { + """ + input[0] = [ + [id: 'test'], + file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.gff3", checkIfExists: true) + ] + input[1] = file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.fasta", checkIfExists: true) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() }, + { assert process.out.gtf == [] }, + { assert process.out.gffread_gff == [] } + ) + } + + } + + test("sarscov2-gff3-fasta-fail-catch") { + + options '-stub' + config "./nextflow-fasta.config" + + when { + params { + outdir = "$outputDir" + } + process { + """ + input[0] = [ + [id: 'genome'], + file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.gff3", checkIfExists: true) + ] + input[1] = file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.fasta", checkIfExists: true) + """ + } + } + + then { + assertAll ( + { assert ! process.success }, + { assert process.stdout.toString().contains("Input and output names are the same") } + ) + } + + } + +} diff --git a/modules/nf-core/gffread/tests/main.nf.test.snap b/modules/nf-core/gffread/tests/main.nf.test.snap new file mode 100644 index 00000000..15262320 --- /dev/null +++ b/modules/nf-core/gffread/tests/main.nf.test.snap @@ -0,0 +1,272 @@ +{ + "sarscov2-gff3-gtf": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.gtf:md5,1ea0ae98d3388e0576407dc4a24ef428" + ] + ], + "1": [ + + ], + "2": [ + + ], + "3": [ + "versions.yml:md5,05f671c6c6e530acedad0af0a5948dbd" + ], + "gffread_fasta": [ + + ], + "gffread_gff": [ + + ], + "gtf": [ + [ + { + "id": "test" + }, + "test.gtf:md5,1ea0ae98d3388e0576407dc4a24ef428" + ] + ], + "versions": [ + "versions.yml:md5,05f671c6c6e530acedad0af0a5948dbd" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-04-09T10:48:56.496187" + }, + "sarscov2-gff3-gff3": { + "content": [ + { + "0": [ + + ], + "1": [ + [ + { + "id": "test" + }, + "test.gff3:md5,c4e5da6267c6bee5899a2c204ae1ad91" + ] + ], + "2": [ + + ], + "3": [ + "versions.yml:md5,05f671c6c6e530acedad0af0a5948dbd" + ], + "gffread_fasta": [ + + ], + "gffread_gff": [ + [ + { + "id": "test" + }, + "test.gff3:md5,c4e5da6267c6bee5899a2c204ae1ad91" + ] + ], + "gtf": [ + + ], + "versions": [ + "versions.yml:md5,05f671c6c6e530acedad0af0a5948dbd" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-04-09T10:49:00.892782" + }, + "sarscov2-gff3-gtf-stub": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.gtf:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + + ], + "2": [ + + ], + "3": [ + "versions.yml:md5,05f671c6c6e530acedad0af0a5948dbd" + ], + "gffread_fasta": [ + + ], + "gffread_gff": [ + + ], + "gtf": [ + [ + { + "id": "test" + }, + "test.gtf:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,05f671c6c6e530acedad0af0a5948dbd" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-04-09T11:11:26.975666" + }, + "sarscov2-gff3-fasta-stub": { + "content": [ + { + "0": [ + + ], + "1": [ + + ], + "2": [ + [ + { + "id": "test" + }, + "test.fasta:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "3": [ + "versions.yml:md5,05f671c6c6e530acedad0af0a5948dbd" + ], + "gffread_fasta": [ + [ + { + "id": "test" + }, + "test.fasta:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "gffread_gff": [ + + ], + "gtf": [ + + ], + "versions": [ + "versions.yml:md5,05f671c6c6e530acedad0af0a5948dbd" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-04-09T11:11:44.34792" + }, + "sarscov2-gff3-gff3-stub": { + "content": [ + { + "0": [ + + ], + "1": [ + [ + { + "id": "test" + }, + "test.gff3:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + + ], + "3": [ + "versions.yml:md5,05f671c6c6e530acedad0af0a5948dbd" + ], + "gffread_fasta": [ + + ], + "gffread_gff": [ + [ + { + "id": "test" + }, + "test.gff3:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "gtf": [ + + ], + "versions": [ + "versions.yml:md5,05f671c6c6e530acedad0af0a5948dbd" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-04-09T11:11:35.221671" + }, + "sarscov2-gff3-fasta": { + "content": [ + { + "0": [ + + ], + "1": [ + + ], + "2": [ + [ + { + "id": "test" + }, + "test.fasta:md5,5f8108fb51739a0588ccf0a251de919a" + ] + ], + "3": [ + "versions.yml:md5,05f671c6c6e530acedad0af0a5948dbd" + ], + "gffread_fasta": [ + [ + { + "id": "test" + }, + "test.fasta:md5,5f8108fb51739a0588ccf0a251de919a" + ] + ], + "gffread_gff": [ + + ], + "gtf": [ + + ], + "versions": [ + "versions.yml:md5,05f671c6c6e530acedad0af0a5948dbd" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-04-09T10:54:02.88143" + } +} \ No newline at end of file diff --git a/modules/nf-core/gffread/tests/nextflow-fasta.config b/modules/nf-core/gffread/tests/nextflow-fasta.config new file mode 100644 index 00000000..ac6cb148 --- /dev/null +++ b/modules/nf-core/gffread/tests/nextflow-fasta.config @@ -0,0 +1,5 @@ +process { + withName: GFFREAD { + ext.args = '-w -S' + } +} diff --git a/modules/nf-core/gffread/tests/nextflow-gff3.config b/modules/nf-core/gffread/tests/nextflow-gff3.config new file mode 100644 index 00000000..afe0830e --- /dev/null +++ b/modules/nf-core/gffread/tests/nextflow-gff3.config @@ -0,0 +1,5 @@ +process { + withName: GFFREAD { + ext.args = '' + } +} diff --git a/modules/nf-core/gffread/tests/nextflow.config b/modules/nf-core/gffread/tests/nextflow.config new file mode 100644 index 00000000..74b25094 --- /dev/null +++ b/modules/nf-core/gffread/tests/nextflow.config @@ -0,0 +1,5 @@ +process { + withName: GFFREAD { + ext.args = '-T' + } +} diff --git a/modules/nf-core/gffread/tests/tags.yml b/modules/nf-core/gffread/tests/tags.yml new file mode 100644 index 00000000..05576065 --- /dev/null +++ b/modules/nf-core/gffread/tests/tags.yml @@ -0,0 +1,2 @@ +gffread: + - modules/nf-core/gffread/** diff --git a/modules/nf-core/gunzip/meta.yml b/modules/nf-core/gunzip/meta.yml index f32973a0..9066c035 100644 --- a/modules/nf-core/gunzip/meta.yml +++ b/modules/nf-core/gunzip/meta.yml @@ -10,25 +10,32 @@ tools: gzip is a file format and a software application used for file compression and decompression. documentation: https://www.gnu.org/software/gzip/manual/gzip.html licence: ["GPL-3.0-or-later"] + identifier: "" input: - - meta: - type: map - description: | - Optional groovy Map containing meta information - e.g. [ id:'test', single_end:false ] - - archive: - type: file - description: File to be compressed/uncompressed - pattern: "*.*" + - - meta: + type: map + description: | + Optional groovy Map containing meta information + e.g. [ id:'test', single_end:false ] + - archive: + type: file + description: File to be compressed/uncompressed + pattern: "*.*" output: - gunzip: - type: file - description: Compressed/uncompressed file - pattern: "*.*" + - meta: + type: file + description: Compressed/uncompressed file + pattern: "*.*" + - $gunzip: + type: file + description: Compressed/uncompressed file + pattern: "*.*" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@joseespinosa" - "@drpatelh" diff --git a/modules/nf-core/merqury/hapmers/meta.yml b/modules/nf-core/merqury/hapmers/meta.yml index 6693f540..2c37d3ac 100644 --- a/modules/nf-core/merqury/hapmers/meta.yml +++ b/modules/nf-core/merqury/hapmers/meta.yml @@ -1,4 +1,3 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "merqury_hapmers" description: A script to generate hap-mer dbs for trios @@ -13,57 +12,85 @@ tools: tool_dev_url: "https://github.com/marbl/merqury" doi: "10.1186/s13059-020-02134-9" licence: ["United States Government Work"] + identifier: biotools:merqury input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - child_meryl: - type: directory - description: Childs' k-mers (all, from WGS reads) - pattern: "*.meryl" - - maternal_meryl: - type: directory - description: Haplotype1 k-mers (all, ex. maternal) - pattern: "*.meryl" - - paternal_meryl: - type: directory - description: Haplotype2 k-mers (all, ex. paternal) - pattern: "*.meryl" - + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - child_meryl: + type: directory + description: Childs' k-mers (all, from WGS reads) + pattern: "*.meryl" + - - maternal_meryl: + type: directory + description: Haplotype1 k-mers (all, ex. maternal) + pattern: "*.meryl" + - - paternal_meryl: + type: directory + description: Haplotype2 k-mers (all, ex. paternal) + pattern: "*.meryl" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - versions: - type: file - description: File containing software versions - pattern: "versions.yml" - mat_hapmer_meryl: - type: directory - description: Inherited maternal hap-mer dbs - pattern: "*_mat.hapmer.meryl" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*_mat.hapmer.meryl": + type: directory + description: Inherited maternal hap-mer dbs + pattern: "*_mat.hapmer.meryl" - pat_hapmer_meryl: - type: directory - description: Inherited paternal hap-mer dbs - pattern: "*_pat.hapmer.meryl" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*_pat.hapmer.meryl": + type: directory + description: Inherited paternal hap-mer dbs + pattern: "*_pat.hapmer.meryl" - inherited_hapmers_fl_png: - type: file - description: k-mer distribution of the inherited dbs and cutoffs used to generate hap-mer dbs - pattern: "*_inherited_hapmers.fl.png" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*_inherited_hapmers.fl.png": + type: file + description: k-mer distribution of the inherited dbs and cutoffs used to generate + hap-mer dbs + pattern: "*_inherited_hapmers.fl.png" - inherited_hapmers_ln_png: - type: file - description: k-mer distribution of the inherited dbs and cutoffs used to generate hap-mer dbs - pattern: "*_inherited_hapmers.ln.png" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*_inherited_hapmers.ln.png": + type: file + description: k-mer distribution of the inherited dbs and cutoffs used to generate + hap-mer dbs + pattern: "*_inherited_hapmers.ln.png" - inherited_hapmers_st_png: - type: file - description: k-mer distribution of the inherited dbs and cutoffs used to generate hap-mer dbs - pattern: "*_inherited_hapmers.st.png" - + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*_inherited_hapmers.st.png": + type: file + description: k-mer distribution of the inherited dbs and cutoffs used to generate + hap-mer dbs + pattern: "*_inherited_hapmers.st.png" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/nf-core/merqury/merqury/meta.yml b/modules/nf-core/merqury/merqury/meta.yml index 19cb11b3..7e8d875a 100644 --- a/modules/nf-core/merqury/merqury/meta.yml +++ b/modules/nf-core/merqury/merqury/meta.yml @@ -10,92 +10,187 @@ tools: tool_dev_url: "https://github.com/marbl/merqury" doi: "10.1186/s13059-020-02134-9" licence: ["PUBLIC DOMAIN"] + identifier: biotools:merqury input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - meryl_db: - type: file - description: "Meryl read database" - - assembly: - type: file - description: FASTA assembly file + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - meryl_db: + type: file + description: "Meryl read database" + - assembly: + type: file + description: FASTA assembly file output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - versions: - type: file - description: File containing software versions - pattern: "versions.yml" - assembly_only_kmers_bed: - type: file - description: "The positions of the k-mers found only in an assembly for further investigation in .bed" - pattern: "*_only.bed" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*_only.bed": + type: file + description: "The positions of the k-mers found only in an assembly for further + investigation in .bed" + pattern: "*_only.bed" - assembly_only_kmers_wig: - type: file - description: "The positions of the k-mers found only in an assembly for further investigation in .wig" - pattern: "*_only.wig" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*_only.wig": + type: file + description: "The positions of the k-mers found only in an assembly for further + investigation in .wig" + pattern: "*_only.wig" - stats: - type: file - description: Assembly statistics file - pattern: "*.completeness.stats" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.completeness.stats": + type: file + description: Assembly statistics file + pattern: "*.completeness.stats" - dist_hist: - type: file - description: Histogram - pattern: "*.dist_only.hist" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.dist_only.hist": + type: file + description: Histogram + pattern: "*.dist_only.hist" - spectra_cn_fl_png: - type: file - description: "Unstacked copy number spectra filled plot in PNG format" - pattern: "*.spectra-cn.fl.png" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.spectra-cn.fl.png": + type: file + description: "Unstacked copy number spectra filled plot in PNG format" + pattern: "*.spectra-cn.fl.png" + - spectra_cn_hist: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.spectra-cn.hist": + type: file + description: "Copy number spectra histogram" + pattern: "*.spectra-cn.hist" - spectra_cn_ln_png: - type: file - description: "Unstacked copy number spectra line plot in PNG format" - pattern: "*.spectra-cn.ln.png" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.spectra-cn.ln.png": + type: file + description: "Unstacked copy number spectra line plot in PNG format" + pattern: "*.spectra-cn.ln.png" - spectra_cn_st_png: - type: file - description: "Stacked copy number spectra line plot in PNG format" - pattern: "*.spectra-cn.st.png" - - spectra_cn_hist: - type: file - description: "Copy number spectra histogram" - pattern: "*.spectra-cn.hist" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.spectra-cn.st.png": + type: file + description: "Stacked copy number spectra line plot in PNG format" + pattern: "*.spectra-cn.st.png" - spectra_asm_fl_png: - type: file - description: "Unstacked assembly spectra filled plot in PNG format" - pattern: "*.spectra-asm.fl.png" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.spectra-asm.fl.png": + type: file + description: "Unstacked assembly spectra filled plot in PNG format" + pattern: "*.spectra-asm.fl.png" + - spectra_asm_hist: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.spectra-asm.hist": + type: file + description: "Assembly spectra histogram" + pattern: "*.spectra-asm.hist" - spectra_asm_ln_png: - type: file - description: "Unstacked assembly spectra line plot in PNG format" - pattern: "*.spectra-asm.ln.png" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.spectra-asm.ln.png": + type: file + description: "Unstacked assembly spectra line plot in PNG format" + pattern: "*.spectra-asm.ln.png" - spectra_asm_st_png: - type: file - description: "Stacked assembly spectra line plot in PNG format" - pattern: "*.spectra-asm.st.png" - - spectra_asm_hist: - type: file - description: "Assembly spectra histogram" - pattern: "*.spectra-asm.hist" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.spectra-asm.st.png": + type: file + description: "Stacked assembly spectra line plot in PNG format" + pattern: "*.spectra-asm.st.png" - assembly_qv: - type: file - description: "Assembly consensus quality estimation" - pattern: "*.qv" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - ${prefix}.qv: + type: file + description: "Assembly consensus quality estimation" + pattern: "*.qv" - scaffold_qv: - type: file - description: "Scaffold consensus quality estimation" - pattern: "*.qv" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - ${prefix}.*.qv: + type: file + description: "Scaffold consensus quality estimation" + pattern: "*.qv" - read_ploidy: - type: file - description: "Ploidy estimate from read k-mer database" - pattern: "*.hist.ploidy" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.hist.ploidy": + type: file + description: "Ploidy estimate from read k-mer database" + pattern: "*.hist.ploidy" - hapmers_blob_png: - type: file - description: "Hap-mer blob plot" - pattern: "*.hapmers.blob.png" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.hapmers.blob.png": + type: file + description: "Hap-mer blob plot" + pattern: "*.hapmers.blob.png" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@mahesh-panchal" maintainers: diff --git a/modules/nf-core/meryl/count/meta.yml b/modules/nf-core/meryl/count/meta.yml index 809a32fe..a110a610 100644 --- a/modules/nf-core/meryl/count/meta.yml +++ b/modules/nf-core/meryl/count/meta.yml @@ -11,34 +11,37 @@ tools: documentation: "https://meryl.readthedocs.io/en/latest/quick-start.html" tool_dev_url: "https://github.com/marbl/meryl" licence: ["GPL"] + identifier: biotools:meryl input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - reads: - type: file - description: | - List of input FastQ files of size 1 and 2 for single-end and paired-end data, - respectively. - - kvalue: - type: integer - description: An integer value of k to use as the k-mer value. + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. + - - kvalue: + type: integer + description: An integer value of k to use as the k-mer value. output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - versions: - type: file - description: File containing software versions - pattern: "versions.yml" - meryl_db: - type: directory - description: A Meryl k-mer database - pattern: "*.meryl" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.meryl": + type: directory + description: A Meryl k-mer database + pattern: "*.meryl" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@mahesh-panchal" maintainers: diff --git a/modules/nf-core/meryl/unionsum/meta.yml b/modules/nf-core/meryl/unionsum/meta.yml index 77d0784c..e9e13051 100644 --- a/modules/nf-core/meryl/unionsum/meta.yml +++ b/modules/nf-core/meryl/unionsum/meta.yml @@ -11,32 +11,35 @@ tools: documentation: "https://meryl.readthedocs.io/en/latest/quick-start.html" tool_dev_url: "https://github.com/marbl/meryl" licence: ["GPL"] + identifier: biotools:meryl input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - meryl_dbs: - type: directory - description: Meryl k-mer databases - - kvalue: - type: integer - description: An integer value of k to use as the k-mer value. + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - meryl_dbs: + type: directory + description: Meryl k-mer databases + - - kvalue: + type: integer + description: An integer value of k to use as the k-mer value. output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - versions: - type: file - description: File containing software versions - pattern: "versions.yml" - meryl_db: - type: directory - description: A Meryl k-mer database that is the union sum of the input databases - pattern: "*.unionsum.meryl" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.unionsum.meryl": + type: directory + description: A Meryl k-mer database that is the union sum of the input databases + pattern: "*.unionsum.meryl" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@mahesh-panchal" maintainers: diff --git a/modules/nf-core/minimap2/align/meta.yml b/modules/nf-core/minimap2/align/meta.yml index 8996f881..a4cfc891 100644 --- a/modules/nf-core/minimap2/align/meta.yml +++ b/modules/nf-core/minimap2/align/meta.yml @@ -14,62 +14,77 @@ tools: homepage: https://github.com/lh3/minimap2 documentation: https://github.com/lh3/minimap2#uguide licence: ["MIT"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - reads: - type: file - description: | - List of input FASTA or FASTQ files of size 1 and 2 for single-end - and paired-end data, respectively. - - meta2: - type: map - description: | - Groovy Map containing reference information - e.g. [ id:'test_ref'] - - reference: - type: file - description: | - Reference database in FASTA format. - - bam_format: - type: boolean - description: Specify that output should be in BAM format - - bam_index_extension: - type: string - description: BAM alignment index extension (e.g. "bai") - - cigar_paf_format: - type: boolean - description: Specify that output CIGAR should be in PAF format - - cigar_bam: - type: boolean - description: | - Write CIGAR with >65535 ops at the CG tag. This is recommended when - doing XYZ (https://github.com/lh3/minimap2#working-with-65535-cigar-operations) + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FASTA or FASTQ files of size 1 and 2 for single-end + and paired-end data, respectively. + - - meta2: + type: map + description: | + Groovy Map containing reference information + e.g. [ id:'test_ref'] + - reference: + type: file + description: | + Reference database in FASTA format. + - - bam_format: + type: boolean + description: Specify that output should be in BAM format + - - bam_index_extension: + type: string + description: BAM alignment index extension (e.g. "bai") + - - cigar_paf_format: + type: boolean + description: Specify that output CIGAR should be in PAF format + - - cigar_bam: + type: boolean + description: | + Write CIGAR with >65535 ops at the CG tag. This is recommended when + doing XYZ (https://github.com/lh3/minimap2#working-with-65535-cigar-operations) output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - paf: - type: file - description: Alignment in PAF format - pattern: "*.paf" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.paf": + type: file + description: Alignment in PAF format + pattern: "*.paf" - bam: - type: file - description: Alignment in BAM format - pattern: "*.bam" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.bam": + type: file + description: Alignment in BAM format + pattern: "*.bam" - index: - type: file - description: BAM alignment index - pattern: "*.bam.*" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.bam.${bam_index_extension}": + type: file + description: BAM alignment index + pattern: "*.bam.*" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@heuermh" - "@sofstam" diff --git a/modules/nf-core/minimap2/align/tests/main.nf.test.snap b/modules/nf-core/minimap2/align/tests/main.nf.test.snap index 12264a85..bafaabd7 100644 --- a/modules/nf-core/minimap2/align/tests/main.nf.test.snap +++ b/modules/nf-core/minimap2/align/tests/main.nf.test.snap @@ -4,8 +4,8 @@ [ "@HD\tVN:1.6\tSO:coordinate", "@SQ\tSN:MT192765.1\tLN:29829", - "@PG\tID:minimap2\tPN:minimap2\tVN:2.28-r1209\tCL:minimap2 -t 2 -a genome.fasta -", - "@PG\tID:samtools\tPN:samtools\tPP:minimap2\tVN:1.20\tCL:samtools sort -@ 1 -o test.bam##idx##test.bam.bai --write-index" + "@PG\tID:minimap2\tPN:minimap2\tVN:2.28-r1209\tCL:minimap2 -t 4 -a genome.fasta -", + "@PG\tID:samtools\tPN:samtools\tPP:minimap2\tVN:1.20\tCL:samtools sort -@ 3 -o test.bam##idx##test.bam.bai --write-index" ], "5d426b9a5f5b2c54f1d7f1e4c238ae94", "test.bam.bai", @@ -14,10 +14,10 @@ ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.2" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-25T09:03:00.827260362" + "timestamp": "2024-10-10T20:47:38.40726" }, "sarscov2 - bam, fasta, true, 'bai', false, false - stub": { "content": [ @@ -236,8 +236,8 @@ [ "@HD\tVN:1.6\tSO:coordinate", "@SQ\tSN:MT192765.1\tLN:29829", - "@PG\tID:minimap2\tPN:minimap2\tVN:2.28-r1209\tCL:minimap2 -t 2 -a genome.fasta test_1.fastq.gz test_2.fastq.gz", - "@PG\tID:samtools\tPN:samtools\tPP:minimap2\tVN:1.20\tCL:samtools sort -@ 1 -o test.bam" + "@PG\tID:minimap2\tPN:minimap2\tVN:2.28-r1209\tCL:minimap2 -t 4 -a genome.fasta test_1.fastq.gz test_2.fastq.gz", + "@PG\tID:samtools\tPN:samtools\tPP:minimap2\tVN:1.20\tCL:samtools sort -@ 3 -o test.bam" ], "1bc392244f228bf52cf0b5a8f6a654c9", [ @@ -245,18 +245,18 @@ ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.2" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-23T11:18:18.964586894" + "timestamp": "2024-10-10T20:47:04.155509" }, "sarscov2 - fastq, fasta, true, [], false, false": { "content": [ [ "@HD\tVN:1.6\tSO:coordinate", "@SQ\tSN:MT192765.1\tLN:29829", - "@PG\tID:minimap2\tPN:minimap2\tVN:2.28-r1209\tCL:minimap2 -t 2 -a genome.fasta test_1.fastq.gz", - "@PG\tID:samtools\tPN:samtools\tPP:minimap2\tVN:1.20\tCL:samtools sort -@ 1 -o test.bam" + "@PG\tID:minimap2\tPN:minimap2\tVN:2.28-r1209\tCL:minimap2 -t 4 -a genome.fasta test_1.fastq.gz", + "@PG\tID:samtools\tPN:samtools\tPP:minimap2\tVN:1.20\tCL:samtools sort -@ 3 -o test.bam" ], "f194745c0ccfcb2a9c0aee094a08750", [ @@ -264,18 +264,18 @@ ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.2" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-23T11:17:48.667488325" + "timestamp": "2024-10-10T20:46:42.073683" }, "sarscov2 - fastq, fasta, true, 'bai', false, false": { "content": [ [ "@HD\tVN:1.6\tSO:coordinate", "@SQ\tSN:MT192765.1\tLN:29829", - "@PG\tID:minimap2\tPN:minimap2\tVN:2.28-r1209\tCL:minimap2 -t 2 -a genome.fasta test_1.fastq.gz", - "@PG\tID:samtools\tPN:samtools\tPP:minimap2\tVN:1.20\tCL:samtools sort -@ 1 -o test.bam##idx##test.bam.bai --write-index" + "@PG\tID:minimap2\tPN:minimap2\tVN:2.28-r1209\tCL:minimap2 -t 4 -a genome.fasta test_1.fastq.gz", + "@PG\tID:samtools\tPN:samtools\tPP:minimap2\tVN:1.20\tCL:samtools sort -@ 3 -o test.bam##idx##test.bam.bai --write-index" ], "f194745c0ccfcb2a9c0aee094a08750", "test.bam.bai", @@ -284,18 +284,18 @@ ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.2" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-23T11:18:02.517416733" + "timestamp": "2024-10-10T20:46:53.814566" }, "sarscov2 - bam, fasta, true, [], false, false": { "content": [ [ "@HD\tVN:1.6\tSO:coordinate", "@SQ\tSN:MT192765.1\tLN:29829", - "@PG\tID:minimap2\tPN:minimap2\tVN:2.28-r1209\tCL:minimap2 -t 2 -a genome.fasta -", - "@PG\tID:samtools\tPN:samtools\tPP:minimap2\tVN:1.20\tCL:samtools sort -@ 1 -o test.bam" + "@PG\tID:minimap2\tPN:minimap2\tVN:2.28-r1209\tCL:minimap2 -t 4 -a genome.fasta -", + "@PG\tID:samtools\tPN:samtools\tPP:minimap2\tVN:1.20\tCL:samtools sort -@ 3 -o test.bam" ], "5d426b9a5f5b2c54f1d7f1e4c238ae94", [ @@ -303,10 +303,10 @@ ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.2" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-25T09:02:49.64829488" + "timestamp": "2024-10-10T20:47:26.993111" }, "sarscov2 - bam, fasta, true, [], false, false - stub": { "content": [ @@ -459,8 +459,8 @@ "@SQ\tSN:ERR5069949.3258358\tLN:151", "@SQ\tSN:ERR5069949.1476386\tLN:151", "@SQ\tSN:ERR5069949.2415814\tLN:150", - "@PG\tID:minimap2\tPN:minimap2\tVN:2.28-r1209\tCL:minimap2 -t 2 -a test_1.fastq.gz test_1.fastq.gz", - "@PG\tID:samtools\tPN:samtools\tPP:minimap2\tVN:1.20\tCL:samtools sort -@ 1 -o test.bam" + "@PG\tID:minimap2\tPN:minimap2\tVN:2.28-r1209\tCL:minimap2 -t 4 -a test_1.fastq.gz test_1.fastq.gz", + "@PG\tID:samtools\tPN:samtools\tPP:minimap2\tVN:1.20\tCL:samtools sort -@ 3 -o test.bam" ], "16c1c651f8ec67383bcdee3c55aed94f", [ @@ -468,9 +468,9 @@ ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.2" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-23T11:18:34.246998277" + "timestamp": "2024-10-10T20:47:14.585958" } } \ No newline at end of file diff --git a/modules/nf-core/orthofinder/environment.yml b/modules/nf-core/orthofinder/environment.yml new file mode 100644 index 00000000..68c475f8 --- /dev/null +++ b/modules/nf-core/orthofinder/environment.yml @@ -0,0 +1,6 @@ +channels: + - conda-forge + - bioconda +dependencies: + - bioconda::diamond=2.1.9 + - bioconda::orthofinder=2.5.5 diff --git a/modules/nf-core/orthofinder/main.nf b/modules/nf-core/orthofinder/main.nf new file mode 100644 index 00000000..a47c4dea --- /dev/null +++ b/modules/nf-core/orthofinder/main.nf @@ -0,0 +1,80 @@ +process ORTHOFINDER { + tag "$meta.id" + label 'process_high' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/orthofinder:2.5.5--hdfd78af_2': + 'biocontainers/orthofinder:2.5.5--hdfd78af_2' }" + + input: + tuple val(meta), path(fastas, stageAs: 'input/') + tuple val(meta2), path(prior_run) + + output: + tuple val(meta), path("$prefix") , emit: orthofinder + tuple val(meta), path("$prefix/WorkingDirectory") , emit: working + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + prefix = task.ext.prefix ?: "${meta.id}" + def include_command = prior_run ? "-b $prior_run" : '' + + """ + mkdir temp_pickle + + orthofinder \\ + -t $task.cpus \\ + -a $task.cpus \\ + -p temp_pickle \\ + -f input \\ + -n $prefix \\ + $include_command \\ + $args + + if [ -e input/OrthoFinder/Results_$prefix ]; then + mv input/OrthoFinder/Results_$prefix $prefix + fi + + if [ -e ${prior_run}/OrthoFinder/Results_$prefix ]; then + mv ${prior_run}/OrthoFinder/Results_$prefix $prefix + fi + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + orthofinder: \$(orthofinder -h | sed -n 's/.*version \\(.*\\) Copy.*/\\1/p') + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + prefix = task.ext.prefix ?: "${meta.id}" + def include_command = prior_run ? "-b $prior_run" : '' + + """ + mkdir -p $prefix/Comparative_Genomics_Statistics + mkdir $prefix/Gene_Duplication_Events + mkdir $prefix/Gene_Trees + mkdir $prefix/Orthogroup_Sequences + mkdir $prefix/Orthogroups + mkdir $prefix/Orthologues + mkdir $prefix/Phylogenetic_Hierarchical_Orthogroups + mkdir $prefix/Phylogenetically_Misplaced_Genes + mkdir $prefix/Putative_Xenologs + mkdir $prefix/Resolved_Gene_Trees + mkdir $prefix/Single_Copy_Orthologue_Sequences + mkdir $prefix/Species_Tree + mkdir $prefix/WorkingDirectory + + touch $prefix/Log.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + orthofinder: \$(orthofinder -h | sed -n 's/.*version \\(.*\\) Copy.*/\\1/p') + END_VERSIONS + """ +} diff --git a/modules/nf-core/orthofinder/meta.yml b/modules/nf-core/orthofinder/meta.yml new file mode 100644 index 00000000..4aeb46b3 --- /dev/null +++ b/modules/nf-core/orthofinder/meta.yml @@ -0,0 +1,71 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json +name: "orthofinder" +description: OrthoFinder is a fast, accurate and comprehensive platform for comparative + genomics. +keywords: + - genomics + - orthogroup + - orthologs + - gene + - duplication + - tree + - phylogeny +tools: + - "orthofinder": + description: "Accurate inference of orthogroups, orthologues, gene trees and rooted + species tree made easy!" + homepage: "https://github.com/davidemms/OrthoFinder" + documentation: "https://github.com/davidemms/OrthoFinder" + tool_dev_url: "https://github.com/davidemms/OrthoFinder" + doi: "10.1186/s13059-019-1832-y" + licence: ["GPL v3"] + identifier: biotools:OrthoFinder + +input: + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - fastas: + type: list + description: Input fasta files + pattern: "*.{fa,faa,fasta,fas,pep}" + - - meta2: + type: map + description: | + Groovy Map containing a name + e.g. `[ id:'folder1' ]` + - prior_run: + type: directory + description: | + A folder container containing a previous WorkingDirectory from Orthofinder. +output: + - orthofinder: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - $prefix: + type: directory + description: Orthofinder output directory + - working: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - $prefix/WorkingDirectory: + type: directory + description: Orthofinder output WorkingDirectory (used for the orthofinder resume + function) + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@GallVp" +maintainers: + - "@GallVp" diff --git a/modules/nf-core/orthofinder/tests/main.nf.test b/modules/nf-core/orthofinder/tests/main.nf.test new file mode 100644 index 00000000..aa68d1d2 --- /dev/null +++ b/modules/nf-core/orthofinder/tests/main.nf.test @@ -0,0 +1,161 @@ +import groovy.io.FileType + +nextflow_process { + + name "Test Process ORTHOFINDER" + script "../main.nf" + process "ORTHOFINDER" + + tag "modules" + tag "modules_nfcore" + tag "orthofinder" + tag "untar" + + test("sarscov2 - candidatus_portiera_aleyrodidarum - proteome") { + + when { + process { + """ + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/proteome.fasta', checkIfExists: true) + .copyTo("${workDir}/sarscov2.fasta") + + file(params.modules_testdata_base_path + 'genomics/prokaryotes/candidatus_portiera_aleyrodidarum/genome/proteome.fasta', checkIfExists: true) + .copyTo("${workDir}/candidatus_portiera_aleyrodidarum.fasta") + + def file_a = file("${workDir}/sarscov2.fasta", checkIfExists:true) + def file_b = file("${workDir}/candidatus_portiera_aleyrodidarum.fasta", checkIfExists:true) + + input[0] = [ + [ id:'test', single_end:false ], + [ file_a, file_b ] + ] + input[1] = [ + [], + [] + ] + """ + } + } + + then { + assert process.success + + def all_files = [] + + file(process.out.orthofinder[0][1]).eachFileRecurse (FileType.FILES) { file -> + all_files << file + } + + def stable_file_names = [ + 'Statistics_PerSpecies.tsv', + 'SpeciesTree_Gene_Duplications_0.5_Support.txt', + 'SpeciesTree_rooted.txt' + ] + + def stable_files = all_files.findAll { it.name in stable_file_names } + + assert snapshot( + stable_files.toSorted(), + process.out.versions[0] + ).match() + } + + } + + + test("sarscov2 - candidatus_portiera_aleyrodidarum - proteome - resume") { + + + setup { + run("UNTAR") { + script "../../untar/main.nf" + process { + """ + input[0] = [ [ id:'test1' ], // meta map + file(params.modules_testdata_base_path + 'delete_me/orthofinder/WorkingDirectory.tar.gz', checkIfExists: true) + ] + """ + } + } + } + + when { + process { + """ + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/proteome.fasta', checkIfExists: true) + .copyTo("${workDir}/sarscov2.fasta") + + def file_a = file("https://raw.githubusercontent.com/nf-core/test-datasets/proteinfold/testdata/sequences/H1065.fasta") + def file_c = UNTAR.out.untar + input[0] = [ + [ id:'test_2', single_end:false ], + [ file_a ] + ] + input[1] = UNTAR.out.untar + """ + } + } + + then { + assert process.success + + def all_files = [] + + file(process.out.orthofinder[0][1]).eachFileRecurse (FileType.FILES) { file -> + all_files << file + } + + def stable_file_names = [ + 'Statistics_PerSpecies.tsv', + 'OrthologuesStats_Totals.tsv', + 'Duplications_per_Species_Tree_Node.tsv' + ] + + def stable_files = all_files.findAll { it.name in stable_file_names } + + assert snapshot( + stable_files.toSorted(), + process.out.versions[0] + ).match() + } + + } + + test("sarscov2 - candidatus_portiera_aleyrodidarum - proteome - stub") { + + options '-stub' + + when { + process { + """ + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/proteome.fasta', checkIfExists: true) + .copyTo("${workDir}/sarscov2.fasta") + + file(params.modules_testdata_base_path + 'genomics/prokaryotes/candidatus_portiera_aleyrodidarum/genome/proteome.fasta', checkIfExists: true) + .copyTo("${workDir}/candidatus_portiera_aleyrodidarum.fasta") + + def file_a = file("${workDir}/sarscov2.fasta", checkIfExists:true) + def file_b = file("${workDir}/candidatus_portiera_aleyrodidarum.fasta", checkIfExists:true) + + input[0] = [ + [ id:'test', single_end:false ], + [ file_a, file_b ] + ] + input[1] = [ + [], + [] + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + +} diff --git a/modules/nf-core/orthofinder/tests/main.nf.test.snap b/modules/nf-core/orthofinder/tests/main.nf.test.snap new file mode 100644 index 00000000..f2c7b916 --- /dev/null +++ b/modules/nf-core/orthofinder/tests/main.nf.test.snap @@ -0,0 +1,171 @@ +{ + "sarscov2 - candidatus_portiera_aleyrodidarum - proteome": { + "content": [ + [ + "Statistics_PerSpecies.tsv:md5,984b5011a34d54527fe17896bfa36a2d", + "SpeciesTree_Gene_Duplications_0.5_Support.txt:md5,8b7a673e2e8b6d1aeb697f2bb88afa18", + "SpeciesTree_rooted.txt:md5,4d5ea525feebe479fca0c0768271ba81" + ], + "versions.yml:md5,86b472c85626aac1840eec0769016f5c" + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-09-03T10:59:02.895708598" + }, + "sarscov2 - candidatus_portiera_aleyrodidarum - proteome - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + [ + [ + + ], + [ + + ], + [ + + ], + "Log.txt:md5,d41d8cd98f00b204e9800998ecf8427e", + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ] + ] + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + [ + + ] + ] + ], + "2": [ + "versions.yml:md5,86b472c85626aac1840eec0769016f5c" + ], + "orthofinder": [ + [ + { + "id": "test", + "single_end": false + }, + [ + [ + + ], + [ + + ], + [ + + ], + "Log.txt:md5,d41d8cd98f00b204e9800998ecf8427e", + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ] + ] + ] + ], + "versions": [ + "versions.yml:md5,86b472c85626aac1840eec0769016f5c" + ], + "working": [ + [ + { + "id": "test", + "single_end": false + }, + [ + + ] + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-09-03T11:07:31.319665056" + }, + "sarscov2 - candidatus_portiera_aleyrodidarum - proteome - resume": { + "content": [ + [ + "Duplications_per_Species_Tree_Node.tsv:md5,addc6f5ceec40bd82b00038d1872a27c", + "OrthologuesStats_Totals.tsv:md5,20d243abef226051a43cb37e922fc3eb", + "Statistics_PerSpecies.tsv:md5,83174c383b6c6828d1cc9b3be1679890" + ], + "versions.yml:md5,86b472c85626aac1840eec0769016f5c" + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-09-03T11:04:10.916947006" + } +} \ No newline at end of file diff --git a/modules/nf-core/orthofinder/tests/tags.yml b/modules/nf-core/orthofinder/tests/tags.yml new file mode 100644 index 00000000..f386e259 --- /dev/null +++ b/modules/nf-core/orthofinder/tests/tags.yml @@ -0,0 +1,2 @@ +orthofinder: + - "modules/nf-core/orthofinder/**" diff --git a/modules/nf-core/seqkit/rmdup/meta.yml b/modules/nf-core/seqkit/rmdup/meta.yml index d0addd45..22e29c11 100644 --- a/modules/nf-core/seqkit/rmdup/meta.yml +++ b/modules/nf-core/seqkit/rmdup/meta.yml @@ -1,7 +1,7 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "seqkit_rmdup" -description: Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...) +description: Transforms sequences (extract ID, filter by length, remove gaps, reverse + complement...) keywords: - genomics - fasta @@ -16,34 +16,43 @@ tools: tool_dev_url: "https://github.com/shenwei356/seqkit" doi: "10.1371/journal.pone.0163962" licence: ["MIT"] + identifier: biotools:seqkit input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - fastx: - type: file - description: Input fasta/fastq file - pattern: "*.{fsa,fas,fa,fasta,fastq,fq,fsa.gz,fas.gz,fa.gz,fasta.gz,fastq.gz,fq.gz}" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - fastx: + type: file + description: Input fasta/fastq file + pattern: "*.{fsa,fas,fa,fasta,fastq,fq,fsa.gz,fas.gz,fa.gz,fasta.gz,fastq.gz,fq.gz}" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - fastx: - type: file - description: Output fasta/fastq file - pattern: "*.{fasta,fasta.gz,fastq,fastq.gz}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - ${prefix}.${extension}: + type: file + description: Output fasta/fastq file + pattern: "*.{fasta,fasta.gz,fastq,fastq.gz}" - log: - type: file - description: Log containing information regarding removed duplicates - pattern: "*.log" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*.log": + type: file + description: Log containing information regarding removed duplicates + pattern: "*.log" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/nf-core/seqkit/seq/meta.yml b/modules/nf-core/seqkit/seq/meta.yml index 8d4e2b16..7d32aba5 100644 --- a/modules/nf-core/seqkit/seq/meta.yml +++ b/modules/nf-core/seqkit/seq/meta.yml @@ -1,7 +1,7 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "seqkit_seq" -description: Transforms sequences (extract ID, filter by length, remove gaps, reverse complement...) +description: Transforms sequences (extract ID, filter by length, remove gaps, reverse + complement...) keywords: - genomics - fasta @@ -18,30 +18,33 @@ tools: tool_dev_url: "https://github.com/shenwei356/seqkit" doi: "10.1371/journal.pone.0163962" licence: ["MIT"] + identifier: biotools:seqkit input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - fastx: - type: file - description: Input fasta/fastq file - pattern: "*.{fsa,fas,fa,fasta,fastq,fq,fsa.gz,fas.gz,fa.gz,fasta.gz,fastq.gz,fq.gz}" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - fastx: + type: file + description: Input fasta/fastq file + pattern: "*.{fsa,fas,fa,fasta,fastq,fq,fsa.gz,fas.gz,fa.gz,fasta.gz,fastq.gz,fq.gz}" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - fastx: - type: file - description: Output fasta/fastq file - pattern: "*.{fasta,fasta.gz,fastq,fastq.gz}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - ${prefix}.*: + type: file + description: Output fasta/fastq file + pattern: "*.{fasta,fasta.gz,fastq,fastq.gz}" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/nf-core/seqkit/sort/meta.yml b/modules/nf-core/seqkit/sort/meta.yml index 2e61ce15..157ff85c 100644 --- a/modules/nf-core/seqkit/sort/meta.yml +++ b/modules/nf-core/seqkit/sort/meta.yml @@ -1,4 +1,3 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "seqkit_sort" description: Sorts sequences by id/name/sequence/length @@ -15,30 +14,33 @@ tools: tool_dev_url: "https://github.com/shenwei356/seqkit" doi: "10.1371/journal.pone.0163962" licence: ["MIT"] + identifier: biotools:seqkit input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - fastx: - type: file - description: Input fasta/fastq file - pattern: "*.{fsa,fas,fa,fasta,fastq,fq,fsa.gz,fas.gz,fa.gz,fasta.gz,fastq.gz,fq.gz}" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - fastx: + type: file + description: Input fasta/fastq file + pattern: "*.{fsa,fas,fa,fasta,fastq,fq,fsa.gz,fas.gz,fa.gz,fasta.gz,fastq.gz,fq.gz}" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - fastx: - type: file - description: Output fasta/fastq file - pattern: "*.{fasta.gz,fastq.gz}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - ${prefix}.*: + type: file + description: Output fasta/fastq file + pattern: "*.{fasta.gz,fastq.gz}" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/nf-core/sratools/fasterqdump/meta.yml b/modules/nf-core/sratools/fasterqdump/meta.yml index 6a2151a8..42e2c07c 100644 --- a/modules/nf-core/sratools/fasterqdump/meta.yml +++ b/modules/nf-core/sratools/fasterqdump/meta.yml @@ -1,5 +1,6 @@ name: sratools_fasterqdump -description: Extract sequencing reads in FASTQ format from a given NCBI Sequence Read Archive (SRA). +description: Extract sequencing reads in FASTQ format from a given NCBI Sequence Read + Archive (SRA). keywords: - sequencing - FASTQ @@ -11,42 +12,44 @@ tools: documentation: https://github.com/ncbi/sra-tools/wiki tool_dev_url: https://github.com/ncbi/sra-tools licence: ["Public Domain"] + identifier: "" input: - - meta: - type: map - description: > - Groovy Map containing sample information e.g. [ id:'test', single_end:false ] - - - sra: - type: directory - description: Directory containing ETL data for the given SRA. - pattern: "*/*.sra" - - ncbi_settings: - type: file - description: > - An NCBI user settings file. - - pattern: "*.mkfg" - - certificate: - type: file - description: > - Path to a JWT cart file used to access protected dbGAP data on SRA using the sra-toolkit - - pattern: "*.cart" + - - meta: + type: map + description: > + Groovy Map containing sample information e.g. [ id:'test', single_end:false + ] + - sra: + type: directory + description: Directory containing ETL data for the given SRA. + pattern: "*/*.sra" + - - ncbi_settings: + type: file + description: > + An NCBI user settings file. + pattern: "*.mkfg" + - - certificate: + type: file + description: > + Path to a JWT cart file used to access protected dbGAP data on SRA using the + sra-toolkit + pattern: "*.cart" output: - - meta: - type: map - description: > - Groovy Map containing sample information e.g. [ id:'test', single_end:false ] - - - versions: - type: file - description: File containing software versions - pattern: "versions.yml" - reads: - type: file - description: Extracted FASTQ file or files if the sequencing reads are paired-end. - pattern: "*.fastq.gz" + - meta: + type: map + description: > + Groovy Map containing sample information e.g. [ id:'test', single_end:false + ] + - "*.fastq.gz": + type: file + description: Extracted FASTQ file or files if the sequencing reads are paired-end. + pattern: "*.fastq.gz" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@Midnighter" maintainers: diff --git a/modules/nf-core/sratools/prefetch/meta.yml b/modules/nf-core/sratools/prefetch/meta.yml index 7ed42d49..3a537bfe 100644 --- a/modules/nf-core/sratools/prefetch/meta.yml +++ b/modules/nf-core/sratools/prefetch/meta.yml @@ -11,45 +11,47 @@ tools: documentation: https://github.com/ncbi/sra-tools/wiki tool_dev_url: https://github.com/ncbi/sra-tools licence: ["Public Domain"] + identifier: "" input: - - meta: - type: map - description: > - Groovy Map containing sample information e.g. [ id:'test', single_end:false ] - - - id: - type: string - description: > - A string denoting an SRA id. - - - ncbi_settings: - type: file - description: > - An NCBI user settings file. - - pattern: "*.mkfg" - - certificate: - type: file - description: > - Path to a JWT cart file used to access protected dbGAP data on SRA using the sra-toolkit - - pattern: "*.cart" + - - meta: + type: map + description: > + Groovy Map containing sample information e.g. [ id:'test', single_end:false + ] + - id: + type: string + description: > + A string denoting an SRA id. + - - ncbi_settings: + type: file + description: > + An NCBI user settings file. + pattern: "*.mkfg" + - - certificate: + type: file + description: > + Path to a JWT cart file used to access protected dbGAP data on SRA using the + sra-toolkit + pattern: "*.cart" output: - - meta: - type: map - description: > - Groovy Map containing sample information e.g. [ id:'test', single_end:false ] - - sra: - type: directory - description: > - Directory containing the ETL data for the given SRA id. - - pattern: "*/*.sra" + - meta: + type: map + description: > + Groovy Map containing sample information e.g. [ id:'test', single_end:false + ] + pattern: "*/*.sra" + - "id, type: 'dir": + type: map + description: > + Groovy Map containing sample information e.g. [ id:'test', single_end:false + ] + pattern: "*/*.sra" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@Midnighter" maintainers: diff --git a/modules/nf-core/tidk/explore/meta.yml b/modules/nf-core/tidk/explore/meta.yml index 582aaf56..72d15954 100644 --- a/modules/nf-core/tidk/explore/meta.yml +++ b/modules/nf-core/tidk/explore/meta.yml @@ -1,4 +1,3 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "tidk_explore" description: | @@ -10,42 +9,52 @@ keywords: - search tools: - "tidk": - description: tidk is a toolkit to identify and visualise telomeric repeats in genomes + description: tidk is a toolkit to identify and visualise telomeric repeats in + genomes homepage: "https://github.com/tolkit/telomeric-identifier" documentation: "https://github.com/tolkit/telomeric-identifier" tool_dev_url: "https://github.com/tolkit/telomeric-identifier" doi: "10.5281/zenodo.10091385" licence: ["MIT"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - fasta: - type: file - description: The input fasta file - pattern: "*.{fsa,fa,fasta}" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - fasta: + type: file + description: The input fasta file + pattern: "*.{fsa,fa,fasta}" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - explore_tsv: - type: file - description: Telomeres and their frequencies in TSV format - pattern: "*.tidk.explore.tsv" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*.tidk.explore.tsv": + type: file + description: Telomeres and their frequencies in TSV format + pattern: "*.tidk.explore.tsv" - top_sequence: - type: file - description: | - The most frequent telomere sequence if one or more - sequences are identified by the toolkit - pattern: "*.top.sequence.txt" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*.top.sequence.txt": + type: file + description: | + The most frequent telomere sequence if one or more + sequences are identified by the toolkit + pattern: "*.top.sequence.txt" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/nf-core/tidk/plot/meta.yml b/modules/nf-core/tidk/plot/meta.yml index 451195c8..75289b28 100644 --- a/modules/nf-core/tidk/plot/meta.yml +++ b/modules/nf-core/tidk/plot/meta.yml @@ -1,4 +1,3 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "tidk_plot" description: | @@ -11,36 +10,40 @@ keywords: - plot tools: - "tidk": - description: tidk is a toolkit to identify and visualise telomeric repeats in genomes + description: tidk is a toolkit to identify and visualise telomeric repeats in + genomes homepage: "https://github.com/tolkit/telomeric-identifier" documentation: "https://github.com/tolkit/telomeric-identifier" tool_dev_url: "https://github.com/tolkit/telomeric-identifier" doi: "10.5281/zenodo.10091385" licence: ["MIT"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - tsv: - type: file - description: Search results in TSV format from `tidk search` - pattern: "*.tsv" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - tsv: + type: file + description: Search results in TSV format from `tidk search` + pattern: "*.tsv" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - svg: - type: file - description: Telomere search plot - pattern: "*.svg" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*.svg": + type: file + description: Telomere search plot + pattern: "*.svg" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/nf-core/tidk/search/meta.yml b/modules/nf-core/tidk/search/meta.yml index 8ba07350..9a30ff15 100644 --- a/modules/nf-core/tidk/search/meta.yml +++ b/modules/nf-core/tidk/search/meta.yml @@ -1,4 +1,3 @@ ---- # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json name: "tidk_search" description: Searches a genome for a telomere string such as TTAGGG @@ -8,43 +7,53 @@ keywords: - search tools: - "tidk": - description: tidk is a toolkit to identify and visualise telomeric repeats in genomes + description: tidk is a toolkit to identify and visualise telomeric repeats in + genomes homepage: "https://github.com/tolkit/telomeric-identifier" documentation: "https://github.com/tolkit/telomeric-identifier" tool_dev_url: "https://github.com/tolkit/telomeric-identifier" doi: "10.5281/zenodo.10091385" licence: ["MIT"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - - fasta: - type: file - description: The input fasta file - pattern: "*.{fsa,fa,fasta}" - - string: - type: string - description: Search string such as TTAGGG + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - fasta: + type: file + description: The input fasta file + pattern: "*.{fsa,fa,fasta}" + - - string: + type: string + description: Search string such as TTAGGG output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. `[ id:'sample1' ]` - tsv: - type: file - description: Search results in TSV format - pattern: "*.tsv" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*.tsv": + type: file + description: Search results in TSV format + pattern: "*.tsv" - bedgraph: - type: file - description: Search results in BEDGRAPH format - pattern: "*.bedgraph" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*.bedgraph": + type: file + description: Search results in BEDGRAPH format + pattern: "*.bedgraph" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@GallVp" maintainers: diff --git a/modules/nf-core/umitools/extract/meta.yml b/modules/nf-core/umitools/extract/meta.yml index 7695b271..648ffbd2 100644 --- a/modules/nf-core/umitools/extract/meta.yml +++ b/modules/nf-core/umitools/extract/meta.yml @@ -1,5 +1,6 @@ name: umitools_extract -description: Extracts UMI barcode from a read and add it to the read name, leaving any sample barcode in place +description: Extracts UMI barcode from a read and add it to the read name, leaving + any sample barcode in place keywords: - UMI - barcode @@ -8,38 +9,49 @@ keywords: tools: - umi_tools: description: > - UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes + UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random + Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes documentation: https://umi-tools.readthedocs.io/en/latest/ license: "MIT" + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - reads: - type: list - description: | - List of input FASTQ files whose UMIs will be extracted. + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: list + description: | + List of input FASTQ files whose UMIs will be extracted. output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - reads: - type: file - description: > - Extracted FASTQ files. | For single-end reads, pattern is \${prefix}.umi_extract.fastq.gz. | For paired-end reads, pattern is \${prefix}.umi_extract_{1,2}.fastq.gz. - pattern: "*.{fastq.gz}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.fastq.gz": + type: file + description: > + Extracted FASTQ files. | For single-end reads, pattern is \${prefix}.umi_extract.fastq.gz. + | For paired-end reads, pattern is \${prefix}.umi_extract_{1,2}.fastq.gz. + pattern: "*.{fastq.gz}" - log: - type: file - description: Logfile for umi_tools - pattern: "*.{log}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.log": + type: file + description: Logfile for umi_tools + pattern: "*.{log}" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@drpatelh" - "@grst" diff --git a/modules/nf-core/untar/meta.yml b/modules/nf-core/untar/meta.yml index a9a2110f..290346b3 100644 --- a/modules/nf-core/untar/meta.yml +++ b/modules/nf-core/untar/meta.yml @@ -10,30 +10,33 @@ tools: Extract tar.gz files. documentation: https://www.gnu.org/software/tar/manual/ licence: ["GPL-3.0-or-later"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - archive: - type: file - description: File to be untar - pattern: "*.{tar}.{gz}" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - archive: + type: file + description: File to be untar + pattern: "*.{tar}.{gz}" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - untar: - type: directory - description: Directory containing contents of archive - pattern: "*/" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - $prefix: + type: directory + description: Directory containing contents of archive + pattern: "*/" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@joseespinosa" - "@drpatelh" diff --git a/nextflow.config b/nextflow.config index b161bdd7..0677fb0d 100644 --- a/nextflow.config +++ b/nextflow.config @@ -9,14 +9,17 @@ // Global default params, used in configs params { - // Input options + // Input/output options input = null + outdir = null + email = null // Validation options check_sequence_duplicates = true - // Assemblathon stats options + // General stats options assemblathon_stats_n_limit = 100 + gfastats_skip = true // NCBI FCS options ncbi_fcs_adaptor_skip = true @@ -28,22 +31,22 @@ params { contamination_stops_pipeline = true + // tidk options + tidk_skip = true + tidk_repeat_seq = null + tidk_filter_by_size = false + tidk_filter_size_bp = 1000000 + // BUSCO options busco_skip = true busco_mode = null busco_lineage_datasets = null busco_download_path = null - // TIDK options - tidk_skip = true - tidk_repeat_seq = null - tidk_filter_by_size = false - tidk_filter_size_bp = 1000000 - // LAI options lai_skip = true - // kraken2 options + // kraken 2 options kraken2_skip = true kraken2_db_path = null @@ -52,6 +55,11 @@ params { hic_skip_fastp = false hic_skip_fastqc = false hic_fastp_ext_args = '--qualified_quality_phred 20 --length_required 50' + hic_samtools_ext_args = '-F 3852' + + // Merqury options + merqury_skip = true + merqury_kmer_length = 21 // Synteny options synteny_skip = true @@ -62,33 +70,21 @@ params { synteny_mummer_plot_type = 'both' synteny_mummer_m2m_align = false synteny_mummer_max_gap = 1000000 - synteny_mummer_min_bundle_size = 1000 + synteny_mummer_min_bundle_size = 1000000 synteny_plot_1_vs_all = false synteny_color_by_contig = true synteny_plotsr_seq_label = 'Chr' synteny_plotsr_assembly_order = null - // Merqury options - merqury_skip = true - merqury_kmer_length = 21 - - // Output options - outdir = './results' - email = null - - // Max resource options - max_memory = '512.GB' - max_cpus = 16 - max_time = '7.day' + // OrthoFinder options + orthofinder_skip = true // Boilerplate options publish_dir_mode = 'copy' email_on_fail = null plaintext_email = false monochrome_logs = false - monochromeLogs = false hook_url = null - help = false version = false // Config options @@ -96,34 +92,20 @@ params { config_profile_description = null custom_config_version = 'master' custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" - config_profile_contact = null - config_profile_url = null - - // Schema validation default options - validationFailUnrecognisedParams = false - validationLenientMode = false - validationSchemaIgnoreParams = '' - validationShowHiddenParams = false - validate_params = true +} +// Max resources +process { + resourceLimits = [ + cpus: 16, + memory: '512.GB', + time: '7.day' + ] } // Load base.config by default for all pipelines includeConfig 'conf/base.config' -// Load nf-core custom profiles from different Institutions -try { - includeConfig "${params.custom_config_base}/nfcore_custom.config" -} catch (Exception e) { - System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") -} - -// Load plant-food-research-open/assemblyqc custom profiles from different institutions. -// try { -// includeConfig "${params.custom_config_base}/pipeline/assemblyqc.config" -// } catch (Exception e) { -// System.err.println("WARNING: Could not load nf-core/config/assemblyqc profiles: ${params.custom_config_base}/pipeline/assemblyqc.config") -// } profiles { debug { dumpHashes = true @@ -138,7 +120,7 @@ profiles { podman.enabled = false shifter.enabled = false charliecloud.enabled = false - conda.channels = ['conda-forge', 'bioconda', 'defaults'] + conda.channels = ['conda-forge', 'bioconda'] apptainer.enabled = false } mamba { @@ -228,18 +210,20 @@ profiles { test_full { includeConfig 'conf/test_full.config' } } -// Set default registry for Apptainer, Docker, Podman and Singularity independent of -profile -// Will not be used unless Apptainer / Docker / Podman / Singularity are enabled -// Set to your registry if you have a mirror of containers -apptainer.registry = 'quay.io' -docker.registry = 'quay.io' -podman.registry = 'quay.io' -singularity.registry = 'quay.io' +// Load nf-core custom profiles from different Institutions +includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/nfcore_custom.config" : "/dev/null" -// Nextflow plugins -plugins { - id 'nf-validation@1.1.3' // Validation of pipeline parameters and creation of an input channel from a sample sheet -} +// Load plant-food-research-open/assemblyqc custom profiles from different institutions. +// includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/pipeline/assemblyqc.config" : "/dev/null" + +// Set default registry for Apptainer, Docker, Podman, Charliecloud and Singularity independent of -profile +// Will not be used unless Apptainer / Docker / Podman / Charliecloud / Singularity are enabled +// Set to your registry if you have a mirror of containers +apptainer.registry = 'quay.io' +docker.registry = 'quay.io' +podman.registry = 'quay.io' +singularity.registry = 'quay.io' +charliecloud.registry = 'quay.io' // Export these variables to prevent local Python/R libraries from conflicting with those in the container // The JULIA depot path has been adjusted to a fixed path `/usr/local/share/julia` that needs to be used for packages in the container. @@ -252,8 +236,15 @@ env { JULIA_DEPOT_PATH = "/usr/local/share/julia" } -// Capture exit codes from upstream processes when piping -process.shell = ['/bin/bash', '-euo', 'pipefail'] +// Set bash options +process.shell = """\ +bash + +set -e # Exit if a tool returns a non-zero status/exit code +set -u # Treat unset variables and parameters as an error +set -o pipefail # Returns the status of the last command to exit with a non-zero status or zero if all successfully execute +set -C # No clobber - prevent output redirection from overwriting files. +""" // Disable process selector warnings by default. Use debug profile to enable warnings. nextflow.enable.configProcessNamesValidation = false @@ -282,43 +273,26 @@ manifest { homePage = 'https://github.com/plant-food-research-open/assemblyqc' description = """A Nextflow pipeline which evaluates assembly quality with multiple QC tools and presents the results in a unified html report.""" mainScript = 'main.nf' - nextflowVersion = '!>=23.04.0' - version = '2.1.1' + nextflowVersion = '!>=24.04.2' + version = '2.2.0' doi = 'https://doi.org/10.1093/bioinformatics/btae477' } -// Load modules.config for DSL2 module specific options -includeConfig 'conf/modules.config' +// Nextflow plugins +plugins { + id 'nf-schema@2.1.1' // Validation of pipeline parameters and creation of an input channel from a sample sheet +} -// Function to ensure that resource requirements don't go beyond -// a maximum limit -def check_max(obj, type) { - if (type == 'memory') { - try { - if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) - return params.max_memory as nextflow.util.MemoryUnit - else - return obj - } catch (all) { - println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'time') { - try { - if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) - return params.max_time as nextflow.util.Duration - else - return obj - } catch (all) { - println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'cpus') { - try { - return Math.min( obj, params.max_cpus as int ) - } catch (all) { - println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" - return obj - } +validation { + defaultIgnoreParams = ["genomes"] + help { + enabled = true + command = "nextflow run $manifest.name -profile --input assemblysheet.csv --outdir " + fullParameter = "help_full" + showHiddenParameter = "show_hidden" } + monochromeLogs = params.monochrome_logs } + +// Load modules.config for DSL2 module specific options +includeConfig 'conf/modules.config' diff --git a/nextflow_schema.json b/nextflow_schema.json index 7c8f50bd..a457f73f 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -1,10 +1,10 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/plant-food-research-open/assemblyqc/main/nextflow_schema.json", "title": "plant-food-research-open/assemblyqc pipeline parameters", "description": "A Nextflow pipeline which evaluates assembly quality with multiple QC tools and presents the results in a unified html report.", "type": "object", - "definitions": { + "$defs": { "input_output_options": { "title": "Input/output options", "type": "object", @@ -25,8 +25,7 @@ "type": "string", "format": "directory-path", "description": "The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.", - "fa_icon": "fas fa-folder-open", - "default": "./results" + "fa_icon": "fas fa-folder-open" }, "email": { "type": "string", @@ -62,6 +61,12 @@ "default": 100, "description": "The number of 'N's for the unknown gap size. NCBI recommendation is 100", "fa_icon": "fas fa-ruler-horizontal" + }, + "gfastats_skip": { + "type": "boolean", + "description": "Skip Gfastats", + "default": true, + "fa_icon": "fas fa-forward" } } }, @@ -108,6 +113,37 @@ } } }, + "tidk_options": { + "title": "tidk options", + "type": "object", + "description": "", + "default": "", + "properties": { + "tidk_skip": { + "type": "boolean", + "description": "Skip telomere identification", + "default": true, + "fa_icon": "fas fa-forward" + }, + "tidk_repeat_seq": { + "type": "string", + "description": "Telomere repeat sequence. Typical values for plant: TTTAGGG, fungus, vertebrates: TTAGGG and Insect: TTAGG", + "pattern": "^[ACGT]+$", + "fa_icon": "fas fa-dna" + }, + "tidk_filter_by_size": { + "type": "boolean", + "description": "Filter assembly sequences smaller than the specified length", + "fa_icon": "fas fa-question-circle" + }, + "tidk_filter_size_bp": { + "type": "integer", + "default": 1000000, + "description": "Filter size in base-pairs", + "fa_icon": "fas fa-ruler-horizontal" + } + } + }, "busco_options": { "title": "BUSCO options", "type": "object", @@ -140,37 +176,6 @@ } } }, - "tidk_options": { - "title": "TIDK options", - "type": "object", - "description": "", - "default": "", - "properties": { - "tidk_skip": { - "type": "boolean", - "description": "Skip telomere identification", - "default": true, - "fa_icon": "fas fa-forward" - }, - "tidk_repeat_seq": { - "type": "string", - "description": "Telomere repeat sequence. Typical values for plant: TTTAGGG, fungus, vertebrates: TTAGGG and Insect: TTAGG", - "pattern": "^[ACGT]+$", - "fa_icon": "fas fa-dna" - }, - "tidk_filter_by_size": { - "type": "boolean", - "description": "Filter assembly sequences smaller than the specified length", - "fa_icon": "fas fa-question-circle" - }, - "tidk_filter_size_bp": { - "type": "integer", - "default": 1000000, - "description": "Filter size in base-pairs", - "fa_icon": "fas fa-ruler-horizontal" - } - } - }, "lai_options": { "title": "LAI options", "type": "object", @@ -186,7 +191,7 @@ } }, "kraken2_options": { - "title": "Kraken2 options", + "title": "Kraken 2 options", "type": "object", "description": "", "default": "", @@ -214,7 +219,8 @@ "hic": { "type": "string", "description": "HiC reads path provided as a SRA ID or as paired reads such as 'hic_reads{1,2}.fastq.gz'", - "pattern": "^SR\\w+$|^\\S+\\{1,2\\}[\\w\\.]*\\.f(ast)?q\\.gz$" + "pattern": "^SR\\w+$|^\\S+\\{1,2\\}[\\w\\.]*\\.f(ast)?q\\.gz$", + "fa_icon": "fas fa-copy" }, "hic_skip_fastp": { "type": "boolean", @@ -231,6 +237,33 @@ "default": "--qualified_quality_phred 20 --length_required 50", "description": "Additional parameters for fastp trimming", "fa_icon": "fas fa-terminal" + }, + "hic_samtools_ext_args": { + "type": "string", + "default": "-F 3852", + "fa_icon": "fas fa-terminal", + "description": "Additional parameters for samtools view command run after samblaster" + } + } + }, + "merqury_options": { + "title": "Merqury options", + "type": "object", + "description": "", + "default": "", + "properties": { + "merqury_skip": { + "type": "boolean", + "default": true, + "description": "Skip merqury analysis", + "fa_icon": "fas fa-forward" + }, + "merqury_kmer_length": { + "type": "integer", + "default": 21, + "description": "kmer length for merqury analysis", + "minimum": 3, + "fa_icon": "fas fa-ruler-horizontal" } } }, @@ -292,7 +325,7 @@ }, "synteny_mummer_min_bundle_size": { "type": "integer", - "default": 1000, + "default": 1000000, "description": "After bundling, any Mummer alignment bundle smaller than this size is filtered out", "fa_icon": "fas fa-ruler-horizontal" }, @@ -321,56 +354,17 @@ } } }, - "merqury_options": { - "title": "Merqury options", + "orthofinder_options": { + "title": "OrthoFinder options", "type": "object", "description": "", "default": "", "properties": { - "merqury_skip": { + "orthofinder_skip": { "type": "boolean", "default": true, - "description": "Skip merqury analysis", - "fa_icon": "fas fa-forward" - }, - "merqury_kmer_length": { - "type": "integer", - "default": 21, - "description": "kmer length for merqury analysis", - "minimum": 3, - "fa_icon": "fas fa-ruler-horizontal" - } - } - }, - "max_job_request_options": { - "title": "Max job request options", - "type": "object", - "fa_icon": "fab fa-acquisitions-incorporated", - "description": "Set the top limit for requested resources for any single job.", - "help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details.", - "properties": { - "max_cpus": { - "type": "integer", - "description": "Maximum number of CPUs that can be requested for any single job.", - "default": 16, - "fa_icon": "fas fa-microchip", - "hidden": true - }, - "max_memory": { - "type": "string", - "description": "Maximum amount of memory that can be requested for any single job. Example: '8.GB'", - "default": "512.GB", - "fa_icon": "fas fa-memory", - "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$", - "hidden": true - }, - "max_time": { - "type": "string", - "description": "Maximum amount of time that can be requested for any single job. Example: '1.day'", - "default": "7.day", - "fa_icon": "far fa-clock", - "pattern": "^(\\d+\\.?\\s*(s|m|h|d|day)\\s*)+$", - "hidden": true + "fa_icon": "fas fa-forward", + "description": "Skip orthofinder" } } }, @@ -406,18 +400,6 @@ "description": "Institutional config description.", "hidden": true, "fa_icon": "fas fa-users-cog" - }, - "config_profile_contact": { - "type": "string", - "description": "Institutional config contact information.", - "hidden": true, - "fa_icon": "fas fa-users-cog" - }, - "config_profile_url": { - "type": "string", - "description": "Institutional config URL link.", - "hidden": true, - "fa_icon": "fas fa-users-cog" } } }, @@ -428,12 +410,6 @@ "description": "Less common options for the pipeline, typically set in a config file.", "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.", "properties": { - "help": { - "type": "boolean", - "description": "Display help text.", - "fa_icon": "fas fa-question-circle", - "hidden": true - }, "version": { "type": "boolean", "description": "Display version and exit.", @@ -467,88 +443,57 @@ "fa_icon": "fas fa-palette", "hidden": true }, - "monochromeLogs": { - "type": "boolean", - "fa_icon": "fas fa-palette", - "description": "Do not use coloured log outputs.", - "hidden": true - }, "hook_url": { "type": "string", "description": "Incoming hook URL for messaging service", "fa_icon": "fas fa-people-group", "hidden": true - }, - "validate_params": { - "type": "boolean", - "description": "Boolean whether to validate parameters against the schema at runtime", - "default": true, - "fa_icon": "fas fa-check-square", - "hidden": true - }, - "validationShowHiddenParams": { - "type": "boolean", - "fa_icon": "far fa-eye-slash", - "description": "Show all params when using `--help`", - "hidden": true - }, - "validationFailUnrecognisedParams": { - "type": "boolean", - "fa_icon": "far fa-check-circle", - "description": "Validation of parameters fails when an unrecognised parameter is found.", - "hidden": true - }, - "validationLenientMode": { - "type": "boolean", - "fa_icon": "far fa-check-circle", - "description": "Validation of parameters in lenient more.", - "hidden": true } } } }, "allOf": [ { - "$ref": "#/definitions/input_output_options" + "$ref": "#/$defs/input_output_options" }, { - "$ref": "#/definitions/validation_options" + "$ref": "#/$defs/validation_options" }, { - "$ref": "#/definitions/general_stats_options" + "$ref": "#/$defs/general_stats_options" }, { - "$ref": "#/definitions/ncbi_fcs_options" + "$ref": "#/$defs/ncbi_fcs_options" }, { - "$ref": "#/definitions/busco_options" + "$ref": "#/$defs/tidk_options" }, { - "$ref": "#/definitions/tidk_options" + "$ref": "#/$defs/busco_options" }, { - "$ref": "#/definitions/lai_options" + "$ref": "#/$defs/lai_options" }, { - "$ref": "#/definitions/kraken2_options" + "$ref": "#/$defs/kraken2_options" }, { - "$ref": "#/definitions/hic_options" + "$ref": "#/$defs/hic_options" }, { - "$ref": "#/definitions/synteny_options" + "$ref": "#/$defs/merqury_options" }, { - "$ref": "#/definitions/merqury_options" + "$ref": "#/$defs/synteny_options" }, { - "$ref": "#/definitions/max_job_request_options" + "$ref": "#/$defs/orthofinder_options" }, { - "$ref": "#/definitions/institutional_config_options" + "$ref": "#/$defs/institutional_config_options" }, { - "$ref": "#/definitions/generic_options" + "$ref": "#/$defs/generic_options" } ] } diff --git a/nf-test.config b/nf-test.config index 9d4198ff..042da10d 100644 --- a/nf-test.config +++ b/nf-test.config @@ -1,8 +1,10 @@ config { + testsDir "." + workDir System.getenv("NFT_WORKDIR") ?: ".nf-test" + configFile "tests/nextflow.config" - testsDir "tests" - workDir ".nf-test" - configFile "nextflow.config" - profile "" - + plugins { + load "nft-bam@0.4.0" + load "nft-utils@0.0.3" + } } diff --git a/pfr/params.json b/pfr/params.json index 8453604b..a4062925 100644 --- a/pfr/params.json +++ b/pfr/params.json @@ -1,28 +1,34 @@ { "input": "/workspace/assemblyqc/testdata/v2/assemblysheet.csv", + "outdir": "./results", + "email": null, "check_sequence_duplicates": true, "assemblathon_stats_n_limit": 100, + "gfastats_skip": false, "ncbi_fcs_adaptor_skip": false, "ncbi_fcs_adaptor_empire": "euk", "ncbi_fcs_gx_skip": false, "ncbi_fcs_gx_tax_id": 3750, "ncbi_fcs_gx_db_path": "/workspace/ComparativeDataSources/NCBI/FCS/GX/r2023-01-24", "contamination_stops_pipeline": false, - "busco_skip": false, - "busco_mode": "genome", - "busco_lineage_datasets": "embryophyta_odb10 eudicots_odb10", - "busco_download_path": "/workspace/ComparativeDataSources/BUSCO/assemblyqc", "tidk_skip": false, "tidk_repeat_seq": "TTTAGGG", "tidk_filter_by_size": true, "tidk_filter_size_bp": 1000000, + "busco_skip": false, + "busco_mode": "genome", + "busco_lineage_datasets": "embryophyta_odb10 eudicots_odb10", + "busco_download_path": "/workspace/ComparativeDataSources/BUSCO/assemblyqc", "lai_skip": false, "kraken2_skip": false, - "kraken2_db_path": "/workspace/ComparativeDataSources/kraken2db/k2_pluspfp_20230314", + "kraken2_db_path": "/workspace/ComparativeDataSources/kraken2db/k2_pluspfp_20240904", "hic": null, "hic_skip_fastp": false, "hic_skip_fastqc": false, "hic_fastp_ext_args": "--qualified_quality_phred 20 --length_required 50", + "hic_samtools_ext_args": "-F 3852", + "merqury_skip": false, + "merqury_kmer_length": 21, "synteny_skip": false, "synteny_mummer_skip": false, "synteny_plotsr_skip": false, @@ -31,12 +37,10 @@ "synteny_mummer_plot_type": "both", "synteny_mummer_m2m_align": false, "synteny_mummer_max_gap": 1000000, - "synteny_mummer_min_bundle_size": 1000, + "synteny_mummer_min_bundle_size": 1000000, "synteny_plot_1_vs_all": false, "synteny_color_by_contig": true, "synteny_plotsr_seq_label": "Chr", "synteny_plotsr_assembly_order": "gddh13_v1p1 m9_v1 m9_v1_h1 m9_v1_h2", - "merqury_skip": false, - "outdir": "./results", - "email": null + "orthofinder_skip": false } diff --git a/pfr_assemblyqc b/pfr_assemblyqc index 8f40d31d..b860ae42 100644 --- a/pfr_assemblyqc +++ b/pfr_assemblyqc @@ -27,7 +27,7 @@ shift $((OPTIND -1)) ml unload perl ml apptainer/1.1 -ml nextflow/23.04.4 +ml nextflow/24.04.3 export TMPDIR="/workspace/$USER/tmp" export APPTAINER_BINDPATH="$APPTAINER_BINDPATH,$TMPDIR:$TMPDIR,$TMPDIR:/tmp" @@ -40,8 +40,9 @@ if [ $full_test_flag -eq 1 ]; then --ncbi_fcs_gx_skip false \ --ncbi_fcs_gx_db_path "/workspace/ComparativeDataSources/NCBI/FCS/GX/r2023-01-24" \ --kraken2_skip false \ - --kraken2_db_path "/workspace/ComparativeDataSources/kraken2db/k2_pluspfp_20230314" \ - -resume + --kraken2_db_path "/workspace/ComparativeDataSources/kraken2db/k2_pluspfp_20240904" \ + -resume \ + --outdir results else nextflow \ main.nf \ diff --git a/subworkflows/gallvp/fastq_bwa_mem_samblaster/tests/main.nf.test.snap b/subworkflows/gallvp/fastq_bwa_mem_samblaster/tests/main.nf.test.snap index a2b24322..270a96d7 100644 --- a/subworkflows/gallvp/fastq_bwa_mem_samblaster/tests/main.nf.test.snap +++ b/subworkflows/gallvp/fastq_bwa_mem_samblaster/tests/main.nf.test.snap @@ -8,7 +8,7 @@ "id": "test", "ref_id": "genome" }, - "test.on.genome.samblaster.bam:md5,38ade4c8ae74d49a197286e09a6cb3ad" + "test.on.genome.samblaster.bam:md5,f1552dd0431bc22320567699a3a70893" ] ], "1": [ @@ -22,7 +22,7 @@ "id": "test", "ref_id": "genome" }, - "test.on.genome.samblaster.bam:md5,38ade4c8ae74d49a197286e09a6cb3ad" + "test.on.genome.samblaster.bam:md5,f1552dd0431bc22320567699a3a70893" ] ], "versions": [ @@ -33,10 +33,10 @@ } ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-05-22T15:55:44.323082" + "timestamp": "2024-10-10T20:56:44.071963" }, "sarscov2-fq-gz-stub": { "content": [ diff --git a/subworkflows/gallvp/gff3_gt_gff3_gff3validator_stat/tests/main.nf.test.snap b/subworkflows/gallvp/gff3_gt_gff3_gff3validator_stat/tests/main.nf.test.snap index 660f7e0b..1ec2fb46 100644 --- a/subworkflows/gallvp/gff3_gt_gff3_gff3validator_stat/tests/main.nf.test.snap +++ b/subworkflows/gallvp/gff3_gt_gff3_gff3validator_stat/tests/main.nf.test.snap @@ -18,7 +18,7 @@ ], "3": [ "versions.yml:md5,0cb9519e626e5128d8495cf29b7d59ff", - "versions.yml:md5,8a418ac34d045b0cdac812eb2dc9c106" + "versions.yml:md5,2f7fe3865b17dd2edde1881a95930174" ], "gff3_stats": [ @@ -36,15 +36,15 @@ ], "versions": [ "versions.yml:md5,0cb9519e626e5128d8495cf29b7d59ff", - "versions.yml:md5,8a418ac34d045b0cdac812eb2dc9c106" + "versions.yml:md5,2f7fe3865b17dd2edde1881a95930174" ] } ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.3" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-29T16:21:54.482267" + "timestamp": "2024-09-25T11:11:44.756433" }, "sarscov2-genome_gff3-test_fasta-correspondence_fail-out_of_bounds": { "content": [ @@ -53,15 +53,15 @@ ], [ "versions.yml:md5,0cb9519e626e5128d8495cf29b7d59ff", - "versions.yml:md5,8a418ac34d045b0cdac812eb2dc9c106", + "versions.yml:md5,2f7fe3865b17dd2edde1881a95930174", "versions.yml:md5,c89b081a13c68acc5326e43ca9104344" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.3" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-29T16:22:06.684959" + "timestamp": "2024-09-25T11:11:57.27286" }, "sarscov2 - fasta - circular_region - pass": { "content": [ @@ -87,8 +87,8 @@ ], "3": [ "versions.yml:md5,0cb9519e626e5128d8495cf29b7d59ff", + "versions.yml:md5,2f7fe3865b17dd2edde1881a95930174", "versions.yml:md5,80555fe6e28e9564cb534f5478842286", - "versions.yml:md5,8a418ac34d045b0cdac812eb2dc9c106", "versions.yml:md5,c89b081a13c68acc5326e43ca9104344" ], "gff3_stats": [ @@ -112,8 +112,8 @@ ], "versions": [ "versions.yml:md5,0cb9519e626e5128d8495cf29b7d59ff", + "versions.yml:md5,2f7fe3865b17dd2edde1881a95930174", "versions.yml:md5,80555fe6e28e9564cb534f5478842286", - "versions.yml:md5,8a418ac34d045b0cdac812eb2dc9c106", "versions.yml:md5,c89b081a13c68acc5326e43ca9104344" ] } @@ -122,7 +122,7 @@ "nf-test": "0.9.0", "nextflow": "24.04.4" }, - "timestamp": "2024-09-19T13:53:32.901064" + "timestamp": "2024-09-25T11:12:04.099852" }, "sarscov2-genome_gff3-homo_sapiens-genome_fasta-correspondence_fail": { "content": [ @@ -131,15 +131,15 @@ ], [ "versions.yml:md5,0cb9519e626e5128d8495cf29b7d59ff", - "versions.yml:md5,8a418ac34d045b0cdac812eb2dc9c106", + "versions.yml:md5,2f7fe3865b17dd2edde1881a95930174", "versions.yml:md5,c89b081a13c68acc5326e43ca9104344" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.3" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-29T16:22:00.684573" + "timestamp": "2024-09-25T11:11:50.869233" }, "sarscov2-genome_gff3-all_pass": { "content": [ @@ -165,8 +165,8 @@ ], "3": [ "versions.yml:md5,0cb9519e626e5128d8495cf29b7d59ff", + "versions.yml:md5,2f7fe3865b17dd2edde1881a95930174", "versions.yml:md5,80555fe6e28e9564cb534f5478842286", - "versions.yml:md5,8a418ac34d045b0cdac812eb2dc9c106", "versions.yml:md5,c89b081a13c68acc5326e43ca9104344" ], "gff3_stats": [ @@ -190,16 +190,16 @@ ], "versions": [ "versions.yml:md5,0cb9519e626e5128d8495cf29b7d59ff", + "versions.yml:md5,2f7fe3865b17dd2edde1881a95930174", "versions.yml:md5,80555fe6e28e9564cb534f5478842286", - "versions.yml:md5,8a418ac34d045b0cdac812eb2dc9c106", "versions.yml:md5,c89b081a13c68acc5326e43ca9104344" ] } ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.3" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-29T16:21:49.138904" + "timestamp": "2024-09-25T11:11:39.62609" } } \ No newline at end of file diff --git a/subworkflows/local/fq2hic.nf b/subworkflows/local/fq2hic.nf index 34a7fa4e..22eed2ef 100644 --- a/subworkflows/local/fq2hic.nf +++ b/subworkflows/local/fq2hic.nf @@ -34,6 +34,7 @@ workflow FQ2HIC { 1 // min_trimmed_reads ) + ch_fastp_log = FASTQ_FASTQC_UMITOOLS_FASTP.out.trim_log ch_trim_reads = FASTQ_FASTQC_UMITOOLS_FASTP.out.reads ch_versions = ch_versions.mix(FASTQ_FASTQC_UMITOOLS_FASTP.out.versions) @@ -64,6 +65,7 @@ workflow FQ2HIC { HICQC ( ch_bam_and_ref.map { meta3, bam, fa -> [ meta3, bam ] } ) + ch_hicqc_pdf = HICQC.out.pdf ch_versions = ch_versions.mix(HICQC.out.versions) // MODULE: MAKEAGPFROMFASTA | AGP2ASSEMBLY | ASSEMBLY2BEDPE @@ -95,7 +97,10 @@ workflow FQ2HIC { ch_versions = ch_versions.mix(HIC2HTML.out.versions.first()) emit: + fastp_log = ch_fastp_log + hicqc_pdf = ch_hicqc_pdf hic = ch_hic html = HIC2HTML.out.html + assembly = AGP2ASSEMBLY.out.assembly versions = ch_versions } diff --git a/subworkflows/local/utils_nfcore_assemblyqc_pipeline/main.nf b/subworkflows/local/utils_nfcore_assemblyqc_pipeline/main.nf index a8b381e2..d9fbc33a 100644 --- a/subworkflows/local/utils_nfcore_assemblyqc_pipeline/main.nf +++ b/subworkflows/local/utils_nfcore_assemblyqc_pipeline/main.nf @@ -10,30 +10,25 @@ import groovy.json.JsonOutput ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -include { UTILS_NFVALIDATION_PLUGIN } from '../../nf-core/utils_nfvalidation_plugin' -include { paramsSummaryMap } from 'plugin/nf-validation' -include { fromSamplesheet } from 'plugin/nf-validation' -include { UTILS_NEXTFLOW_PIPELINE } from '../../nf-core/utils_nextflow_pipeline' +include { UTILS_NFSCHEMA_PLUGIN } from '../../nf-core/utils_nfschema_plugin' +include { paramsSummaryMap } from 'plugin/nf-schema' +include { samplesheetToList } from 'plugin/nf-schema' include { completionEmail } from '../../nf-core/utils_nfcore_pipeline' include { completionSummary } from '../../nf-core/utils_nfcore_pipeline' -include { dashedLine } from '../../nf-core/utils_nfcore_pipeline' -include { nfCoreLogo } from '../../nf-core/utils_nfcore_pipeline' include { imNotification } from '../../nf-core/utils_nfcore_pipeline' include { UTILS_NFCORE_PIPELINE } from '../../nf-core/utils_nfcore_pipeline' -include { workflowCitation } from '../../nf-core/utils_nfcore_pipeline' +include { UTILS_NEXTFLOW_PIPELINE } from '../../nf-core/utils_nextflow_pipeline' /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW TO INITIALISE PIPELINE -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow PIPELINE_INITIALISATION { take: version // boolean: Display version and exit - help // boolean: Display help text - validate_params // boolean: Boolean whether to validate parameters against the schema at runtime monochrome_logs // boolean: Do not use coloured log outputs nextflow_cli_args // array: List of positional nextflow CLI args outdir // string: The output directory where the results will be saved @@ -57,16 +52,10 @@ workflow PIPELINE_INITIALISATION { // // Validate parameters and generate parameter summary to stdout // - pre_help_text = nfCoreLogo(monochrome_logs) - post_help_text = '\n' + workflowCitation() + '\n' + dashedLine(monochrome_logs) - def String workflow_command = "nextflow run ${workflow.manifest.name} -profile --input assemblysheet.csv --outdir " - UTILS_NFVALIDATION_PLUGIN ( - help, - workflow_command, - pre_help_text, - post_help_text, - validate_params, - "nextflow_schema.json" + UTILS_NFSCHEMA_PLUGIN ( + workflow, + true, // validate params + null // schema path: nextflow_schema ) // @@ -84,7 +73,7 @@ workflow PIPELINE_INITIALISATION { // Initialise input channels // - ch_input = Channel.fromSamplesheet('input') + ch_input = Channel.fromList (samplesheetToList(input, "assets/schema_input.json")) // Function: validateInputTags ch_input_validated = ch_input @@ -109,7 +98,7 @@ workflow PIPELINE_INITIALISATION { ch_xref_assembly = params.synteny_skip || ! params.synteny_xref_assemblies ? Channel.empty() - : Channel.fromSamplesheet('synteny_xref_assemblies') + : Channel.fromList(samplesheetToList(params.synteny_xref_assemblies, "assets/schema_xref_assemblies.json")) ch_xref_assembly_validated = ch_xref_assembly | map { row -> row[0] } @@ -189,9 +178,9 @@ workflow PIPELINE_INITIALISATION { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW FOR PIPELINE COMPLETION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow PIPELINE_COMPLETION { @@ -199,13 +188,13 @@ workflow PIPELINE_COMPLETION { take: email // string: email address email_on_fail // string: email address sent on pipeline failure - plaintext_email // boolean: Send plain-text email instead of HTML - outdir // path: Path to output directory where results will be published - monochrome_logs // boolean: Disable ANSI colour codes in log output + plaintext_email // boolean: Send plain-text email instead of HTML + outdir // path: Path to output directory where results will be published + monochrome_logs // boolean: Disable ANSI colour codes in log output hook_url // string: hook URL for notifications - main: + main: summary_params = paramsSummaryMap(workflow, parameters_schema: "nextflow_schema.json") // @@ -213,11 +202,18 @@ workflow PIPELINE_COMPLETION { // workflow.onComplete { if (email || email_on_fail) { - completionEmail(summary_params, email, email_on_fail, plaintext_email, outdir, monochrome_logs) + completionEmail( + summary_params, + email, + email_on_fail, + plaintext_email, + outdir, + monochrome_logs, + [] + ) } completionSummary(monochrome_logs) - if (hook_url) { imNotification(summary_params, hook_url) } @@ -229,9 +225,9 @@ workflow PIPELINE_COMPLETION { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // // Check and validate pipeline parameters diff --git a/subworkflows/nf-core/fastq_fastqc_umitools_fastp/tests/main.nf.test.snap b/subworkflows/nf-core/fastq_fastqc_umitools_fastp/tests/main.nf.test.snap index e7d1f51e..1ea1ff6e 100644 --- a/subworkflows/nf-core/fastq_fastqc_umitools_fastp/tests/main.nf.test.snap +++ b/subworkflows/nf-core/fastq_fastqc_umitools_fastp/tests/main.nf.test.snap @@ -28,7 +28,7 @@ "id": "test", "single_end": false }, - "test.fastp.json:md5,1e0f8e27e71728e2b63fc64086be95cd" + "test.fastp.json:md5,63273f642b5a4495ce7ccba2d4419edf" ] ], [ @@ -55,9 +55,9 @@ ], "meta": { "nf-test": "0.9.0", - "nextflow": "24.04.3" + "nextflow": "24.04.4" }, - "timestamp": "2024-07-22T16:56:01.933832" + "timestamp": "2024-10-10T20:51:01.289498" }, "save_trimmed_fail": { "content": [ @@ -88,7 +88,7 @@ "id": "test", "single_end": false }, - "test.fastp.json:md5,4c3268ddb50ea5b33125984776aa3519" + "test.fastp.json:md5,5b7664268c0537423ffaa162701dae70" ] ], [ @@ -127,9 +127,9 @@ ], "meta": { "nf-test": "0.9.0", - "nextflow": "24.04.3" + "nextflow": "24.04.4" }, - "timestamp": "2024-07-22T16:57:38.736" + "timestamp": "2024-10-10T20:53:09.142373" }, "skip_umi_extract": { "content": [ @@ -160,7 +160,7 @@ "id": "test", "single_end": false }, - "test.fastp.json:md5,1e0f8e27e71728e2b63fc64086be95cd" + "test.fastp.json:md5,63273f642b5a4495ce7ccba2d4419edf" ] ], [ @@ -189,9 +189,9 @@ ], "meta": { "nf-test": "0.9.0", - "nextflow": "24.04.3" + "nextflow": "24.04.4" }, - "timestamp": "2024-07-22T16:56:47.905105" + "timestamp": "2024-10-10T20:52:17.865965" }, "umi_discard_read = 2": { "content": [ @@ -222,7 +222,7 @@ "id": "test", "single_end": false }, - "test.fastp.json:md5,1e0f8e27e71728e2b63fc64086be95cd" + "test.fastp.json:md5,63273f642b5a4495ce7ccba2d4419edf" ] ], [ @@ -251,9 +251,9 @@ ], "meta": { "nf-test": "0.9.0", - "nextflow": "24.04.3" + "nextflow": "24.04.4" }, - "timestamp": "2024-07-22T16:57:05.436744" + "timestamp": "2024-10-10T20:52:36.15093" }, "umi_discard_read = 2 - stub": { "content": [ @@ -569,7 +569,7 @@ "id": "test", "single_end": false }, - "test.fastp.json:md5,b712fd68ed0322f4bec49ff2a5237fcc" + "test.fastp.json:md5,defba10ab9bb3e4235a86d90a51a2e79" ] ], [ @@ -604,9 +604,9 @@ ], "meta": { "nf-test": "0.9.0", - "nextflow": "24.04.3" + "nextflow": "24.04.4" }, - "timestamp": "2024-07-22T16:57:57.472342" + "timestamp": "2024-10-10T20:53:27.17416" }, "skip_trimming": { "content": [ @@ -668,7 +668,7 @@ "id": "test", "single_end": true }, - "test.fastp.json:md5,d39c5c6d9a2e35fb60d26ced46569af6" + "test.fastp.json:md5,b8dca1b3f56429748ab1ee6b84d9880a" ] ], [ @@ -695,9 +695,9 @@ ], "meta": { "nf-test": "0.9.0", - "nextflow": "24.04.3" + "nextflow": "24.04.4" }, - "timestamp": "2024-07-22T16:56:26.778625" + "timestamp": "2024-10-10T20:51:59.583782" }, "min_trimmed_reads = 26": { "content": [ @@ -728,7 +728,7 @@ "id": "test", "single_end": false }, - "test.fastp.json:md5,b712fd68ed0322f4bec49ff2a5237fcc" + "test.fastp.json:md5,defba10ab9bb3e4235a86d90a51a2e79" ] ], [ @@ -763,9 +763,9 @@ ], "meta": { "nf-test": "0.9.0", - "nextflow": "24.04.3" + "nextflow": "24.04.4" }, - "timestamp": "2024-07-22T16:58:16.36697" + "timestamp": "2024-10-10T20:53:43.614487" }, "min_trimmed_reads = 26 - stub": { "content": [ @@ -1676,7 +1676,7 @@ "id": "test", "single_end": false }, - "test.fastp.json:md5,1e0f8e27e71728e2b63fc64086be95cd" + "test.fastp.json:md5,63273f642b5a4495ce7ccba2d4419edf" ] ], [ @@ -1705,9 +1705,9 @@ ], "meta": { "nf-test": "0.9.0", - "nextflow": "24.04.3" + "nextflow": "24.04.4" }, - "timestamp": "2024-07-22T16:55:50.614571" + "timestamp": "2024-10-10T20:50:49.176138" }, "sarscov2 paired-end [fastq] - stub": { "content": [ diff --git a/subworkflows/nf-core/utils_nextflow_pipeline/main.nf b/subworkflows/nf-core/utils_nextflow_pipeline/main.nf index 28e32b20..0fcbf7b3 100644 --- a/subworkflows/nf-core/utils_nextflow_pipeline/main.nf +++ b/subworkflows/nf-core/utils_nextflow_pipeline/main.nf @@ -3,13 +3,12 @@ // /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW DEFINITION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow UTILS_NEXTFLOW_PIPELINE { - take: print_version // boolean: print version dump_parameters // boolean: dump parameters @@ -22,7 +21,7 @@ workflow UTILS_NEXTFLOW_PIPELINE { // Print workflow version and exit on --version // if (print_version) { - log.info "${workflow.manifest.name} ${getWorkflowVersion()}" + log.info("${workflow.manifest.name} ${getWorkflowVersion()}") System.exit(0) } @@ -45,9 +44,9 @@ workflow UTILS_NEXTFLOW_PIPELINE { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // @@ -72,11 +71,11 @@ def getWorkflowVersion() { // Dump pipeline parameters to a JSON file // def dumpParametersToJSON(outdir) { - def timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss') - def filename = "params_${timestamp}.json" - def temp_pf = new File(workflow.launchDir.toString(), ".${filename}") - def jsonStr = groovy.json.JsonOutput.toJson(params) - temp_pf.text = groovy.json.JsonOutput.prettyPrint(jsonStr) + def timestamp = new java.util.Date().format('yyyy-MM-dd_HH-mm-ss') + def filename = "params_${timestamp}.json" + def temp_pf = new File(workflow.launchDir.toString(), ".${filename}") + def jsonStr = groovy.json.JsonOutput.toJson(params) + temp_pf.text = groovy.json.JsonOutput.prettyPrint(jsonStr) nextflow.extension.FilesEx.copyTo(temp_pf.toPath(), "${outdir}/pipeline_info/params_${timestamp}.json") temp_pf.delete() @@ -91,9 +90,14 @@ def checkCondaChannels() { try { def config = parser.load("conda config --show channels".execute().text) channels = config.channels - } catch(NullPointerException | IOException e) { - log.warn "Could not verify conda channel configuration." - return + } + catch (NullPointerException e) { + log.warn("Could not verify conda channel configuration.") + return null + } + catch (IOException e) { + log.warn("Could not verify conda channel configuration.") + return null } // Check that all channels are present @@ -102,23 +106,19 @@ def checkCondaChannels() { def channels_missing = ((required_channels_in_order as Set) - (channels as Set)) as Boolean // Check that they are in the right order - def channel_priority_violation = false - - required_channels_in_order.eachWithIndex { channel, index -> - if (index < required_channels_in_order.size() - 1) { - channel_priority_violation |= !(channels.indexOf(channel) < channels.indexOf(required_channels_in_order[index+1])) - } - } + def channel_priority_violation = required_channels_in_order != channels.findAll { ch -> ch in required_channels_in_order } if (channels_missing | channel_priority_violation) { - log.warn "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" + - " There is a problem with your Conda configuration!\n\n" + - " You will need to set-up the conda-forge and bioconda channels correctly.\n" + - " Please refer to https://bioconda.github.io/\n" + - " The observed channel order is \n" + - " ${channels}\n" + - " but the following channel order is required:\n" + - " ${required_channels_in_order}\n" + - "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" + log.warn """\ + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + There is a problem with your Conda configuration! + You will need to set-up the conda-forge and bioconda channels correctly. + Please refer to https://bioconda.github.io/ + The observed channel order is + ${channels} + but the following channel order is required: + ${required_channels_in_order} + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" + """.stripIndent(true) } } diff --git a/subworkflows/nf-core/utils_nfcore_pipeline/main.nf b/subworkflows/nf-core/utils_nfcore_pipeline/main.nf index cbd8495b..5cb7bafe 100644 --- a/subworkflows/nf-core/utils_nfcore_pipeline/main.nf +++ b/subworkflows/nf-core/utils_nfcore_pipeline/main.nf @@ -3,13 +3,12 @@ // /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW DEFINITION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow UTILS_NFCORE_PIPELINE { - take: nextflow_cli_args @@ -22,9 +21,9 @@ workflow UTILS_NFCORE_PIPELINE { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // @@ -33,12 +32,9 @@ workflow UTILS_NFCORE_PIPELINE { def checkConfigProvided() { def valid_config = true as Boolean if (workflow.profile == 'standard' && workflow.configFiles.size() <= 1) { - log.warn "[$workflow.manifest.name] You are attempting to run the pipeline without any custom configuration!\n\n" + - "This will be dependent on your local compute environment but can be achieved via one or more of the following:\n" + - " (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" + - " (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" + - " (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" + - "Please refer to the quick start section and usage docs for the pipeline.\n " + log.warn( + "[${workflow.manifest.name}] You are attempting to run the pipeline without any custom configuration!\n\n" + "This will be dependent on your local compute environment but can be achieved via one or more of the following:\n" + " (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" + " (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" + " (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" + "Please refer to the quick start section and usage docs for the pipeline.\n " + ) valid_config = false } return valid_config @@ -49,12 +45,14 @@ def checkConfigProvided() { // def checkProfileProvided(nextflow_cli_args) { if (workflow.profile.endsWith(',')) { - error "The `-profile` option cannot end with a trailing comma, please remove it and re-run the pipeline!\n" + - "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + error( + "The `-profile` option cannot end with a trailing comma, please remove it and re-run the pipeline!\n" + "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + ) } if (nextflow_cli_args[0]) { - log.warn "nf-core pipelines do not accept positional arguments. The positional argument `${nextflow_cli_args[0]}` has been detected.\n" + - "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + log.warn( + "nf-core pipelines do not accept positional arguments. The positional argument `${nextflow_cli_args[0]}` has been detected.\n" + "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + ) } } @@ -64,19 +62,13 @@ def checkProfileProvided(nextflow_cli_args) { def workflowCitation() { def temp_doi_ref = "" def manifest_doi = workflow.manifest.doi.tokenize(",") - // Using a loop to handle multiple DOIs + // Handling multiple DOIs // Removing `https://doi.org/` to handle pipelines using DOIs vs DOI resolvers // Removing ` ` since the manifest.doi is a string and not a proper list manifest_doi.each { doi_ref -> temp_doi_ref += " https://doi.org/${doi_ref.replace('https://doi.org/', '').replace(' ', '')}\n" } - return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + - "* The pipeline\n" + - temp_doi_ref + "\n" + - "* The nf-core framework\n" + - " https://doi.org/10.1038/s41587-020-0439-x\n\n" + - "* Software dependencies\n" + - " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" + return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + "* The pipeline\n" + temp_doi_ref + "\n" + "* The nf-core framework\n" + " https://doi.org/10.1038/s41587-020-0439-x\n\n" + "* Software dependencies\n" + " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" } // @@ -102,7 +94,7 @@ def getWorkflowVersion() { // def processVersionsFromYAML(yaml_file) { def yaml = new org.yaml.snakeyaml.Yaml() - def versions = yaml.load(yaml_file).collectEntries { k, v -> [ k.tokenize(':')[-1], v ] } + def versions = yaml.load(yaml_file).collectEntries { k, v -> [k.tokenize(':')[-1], v] } return yaml.dumpAsMap(versions).trim() } @@ -112,8 +104,8 @@ def processVersionsFromYAML(yaml_file) { def workflowVersionToYAML() { return """ Workflow: - $workflow.manifest.name: ${getWorkflowVersion()} - Nextflow: $workflow.nextflow.version + ${workflow.manifest.name}: ${getWorkflowVersion()} + Nextflow: ${workflow.nextflow.version} """.stripIndent().trim() } @@ -121,11 +113,7 @@ def workflowVersionToYAML() { // Get channel of software versions used in pipeline in YAML format // def softwareVersionsToYAML(ch_versions) { - return ch_versions - .unique() - .map { version -> processVersionsFromYAML(version) } - .unique() - .mix(Channel.of(workflowVersionToYAML())) + return ch_versions.unique().map { version -> processVersionsFromYAML(version) }.unique().mix(Channel.of(workflowVersionToYAML())) } // @@ -133,25 +121,31 @@ def softwareVersionsToYAML(ch_versions) { // def paramsSummaryMultiqc(summary_params) { def summary_section = '' - summary_params.keySet().each { group -> - def group_params = summary_params.get(group) // This gets the parameters of that particular group - if (group_params) { - summary_section += "

$group

\n" - summary_section += "
\n" - group_params.keySet().sort().each { param -> - summary_section += "
$param
${group_params.get(param) ?: 'N/A'}
\n" + summary_params + .keySet() + .each { group -> + def group_params = summary_params.get(group) + // This gets the parameters of that particular group + if (group_params) { + summary_section += "

${group}

\n" + summary_section += "
\n" + group_params + .keySet() + .sort() + .each { param -> + summary_section += "
${param}
${group_params.get(param) ?: 'N/A'}
\n" + } + summary_section += "
\n" } - summary_section += "
\n" } - } - def yaml_file_text = "id: '${workflow.manifest.name.replace('/','-')}-summary'\n" as String - yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" - yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" - yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" - yaml_file_text += "plot_type: 'html'\n" - yaml_file_text += "data: |\n" - yaml_file_text += "${summary_section}" + def yaml_file_text = "id: '${workflow.manifest.name.replace('/', '-')}-summary'\n" as String + yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" + yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" + yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" + yaml_file_text += "plot_type: 'html'\n" + yaml_file_text += "data: |\n" + yaml_file_text += "${summary_section}" return yaml_file_text } @@ -199,54 +193,54 @@ def logColours(monochrome_logs=true) { colorcodes['hidden'] = monochrome_logs ? '' : "\033[8m" // Regular Colors - colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" - colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" - colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" - colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" - colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" - colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" - colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" - colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" + colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" + colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" + colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" + colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" + colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" + colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" + colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" + colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" // Bold - colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" - colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" - colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" - colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" - colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" - colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" - colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" - colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" + colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" + colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" + colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" + colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" + colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" + colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" + colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" + colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" // Underline - colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" - colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" - colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" - colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" - colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" - colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" - colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" - colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" + colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" + colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" + colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" + colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" + colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" + colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" + colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" + colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" // High Intensity - colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" - colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" - colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" - colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" - colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" - colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" - colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" - colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" + colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" + colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" + colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" + colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" + colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" + colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" + colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" + colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" // Bold High Intensity - colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" - colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" - colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" - colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" - colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" - colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" - colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" - colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" + colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" + colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" + colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" + colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" + colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" + colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" + colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" + colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" return colorcodes } @@ -261,14 +255,15 @@ def attachMultiqcReport(multiqc_report) { mqc_report = multiqc_report.getVal() if (mqc_report.getClass() == ArrayList && mqc_report.size() >= 1) { if (mqc_report.size() > 1) { - log.warn "[$workflow.manifest.name] Found multiple reports from process 'MULTIQC', will use only one" + log.warn("[${workflow.manifest.name}] Found multiple reports from process 'MULTIQC', will use only one") } mqc_report = mqc_report[0] } } - } catch (all) { + } + catch (Exception all) { if (multiqc_report) { - log.warn "[$workflow.manifest.name] Could not attach MultiQC report to summary email" + log.warn("[${workflow.manifest.name}] Could not attach MultiQC report to summary email") } } return mqc_report @@ -280,26 +275,35 @@ def attachMultiqcReport(multiqc_report) { def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdir, monochrome_logs=true, multiqc_report=null) { // Set up the e-mail variables - def subject = "[$workflow.manifest.name] Successful: $workflow.runName" + def subject = "[${workflow.manifest.name}] Successful: ${workflow.runName}" if (!workflow.success) { - subject = "[$workflow.manifest.name] FAILED: $workflow.runName" + subject = "[${workflow.manifest.name}] FAILED: ${workflow.runName}" } def summary = [:] - summary_params.keySet().sort().each { group -> - summary << summary_params[group] - } + summary_params + .keySet() + .sort() + .each { group -> + summary << summary_params[group] + } def misc_fields = [:] misc_fields['Date Started'] = workflow.start misc_fields['Date Completed'] = workflow.complete misc_fields['Pipeline script file path'] = workflow.scriptFile misc_fields['Pipeline script hash ID'] = workflow.scriptId - if (workflow.repository) misc_fields['Pipeline repository Git URL'] = workflow.repository - if (workflow.commitId) misc_fields['Pipeline repository Git Commit'] = workflow.commitId - if (workflow.revision) misc_fields['Pipeline Git branch/tag'] = workflow.revision - misc_fields['Nextflow Version'] = workflow.nextflow.version - misc_fields['Nextflow Build'] = workflow.nextflow.build + if (workflow.repository) { + misc_fields['Pipeline repository Git URL'] = workflow.repository + } + if (workflow.commitId) { + misc_fields['Pipeline repository Git Commit'] = workflow.commitId + } + if (workflow.revision) { + misc_fields['Pipeline Git branch/tag'] = workflow.revision + } + misc_fields['Nextflow Version'] = workflow.nextflow.version + misc_fields['Nextflow Build'] = workflow.nextflow.build misc_fields['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp def email_fields = [:] @@ -337,7 +341,7 @@ def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdi // Render the sendmail template def max_multiqc_email_size = (params.containsKey('max_multiqc_email_size') ? params.max_multiqc_email_size : 0) as nextflow.util.MemoryUnit - def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "${workflow.projectDir}", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes() ] + def smail_fields = [email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "${workflow.projectDir}", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes()] def sf = new File("${workflow.projectDir}/assets/sendmail_template.txt") def sendmail_template = engine.createTemplate(sf).make(smail_fields) def sendmail_html = sendmail_template.toString() @@ -346,30 +350,32 @@ def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdi def colors = logColours(monochrome_logs) as Map if (email_address) { try { - if (plaintext_email) { throw new org.codehaus.groovy.GroovyException('Send plaintext e-mail, not HTML') } + if (plaintext_email) { +new org.codehaus.groovy.GroovyException('Send plaintext e-mail, not HTML') } // Try to send HTML e-mail using sendmail def sendmail_tf = new File(workflow.launchDir.toString(), ".sendmail_tmp.html") sendmail_tf.withWriter { w -> w << sendmail_html } - [ 'sendmail', '-t' ].execute() << sendmail_html - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (sendmail)-" - } catch (all) { + ['sendmail', '-t'].execute() << sendmail_html + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Sent summary e-mail to ${email_address} (sendmail)-") + } + catch (Exception all) { // Catch failures and try with plaintext - def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ] + def mail_cmd = ['mail', '-s', subject, '--content-type=text/html', email_address] mail_cmd.execute() << email_html - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (mail)-" + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Sent summary e-mail to ${email_address} (mail)-") } } // Write summary e-mail HTML to a file def output_hf = new File(workflow.launchDir.toString(), ".pipeline_report.html") output_hf.withWriter { w -> w << email_html } - nextflow.extension.FilesEx.copyTo(output_hf.toPath(), "${outdir}/pipeline_info/pipeline_report.html"); + nextflow.extension.FilesEx.copyTo(output_hf.toPath(), "${outdir}/pipeline_info/pipeline_report.html") output_hf.delete() // Write summary e-mail TXT to a file def output_tf = new File(workflow.launchDir.toString(), ".pipeline_report.txt") output_tf.withWriter { w -> w << email_txt } - nextflow.extension.FilesEx.copyTo(output_tf.toPath(), "${outdir}/pipeline_info/pipeline_report.txt"); + nextflow.extension.FilesEx.copyTo(output_tf.toPath(), "${outdir}/pipeline_info/pipeline_report.txt") output_tf.delete() } @@ -380,12 +386,14 @@ def completionSummary(monochrome_logs=true) { def colors = logColours(monochrome_logs) as Map if (workflow.success) { if (workflow.stats.ignoredCount == 0) { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Pipeline completed successfully${colors.reset}-" - } else { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.yellow} Pipeline completed successfully, but with errored process(es) ${colors.reset}-" + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Pipeline completed successfully${colors.reset}-") + } + else { + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.yellow} Pipeline completed successfully, but with errored process(es) ${colors.reset}-") } - } else { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed with errors${colors.reset}-" + } + else { + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.red} Pipeline completed with errors${colors.reset}-") } } @@ -394,21 +402,30 @@ def completionSummary(monochrome_logs=true) { // def imNotification(summary_params, hook_url) { def summary = [:] - summary_params.keySet().sort().each { group -> - summary << summary_params[group] - } + summary_params + .keySet() + .sort() + .each { group -> + summary << summary_params[group] + } def misc_fields = [:] - misc_fields['start'] = workflow.start - misc_fields['complete'] = workflow.complete - misc_fields['scriptfile'] = workflow.scriptFile - misc_fields['scriptid'] = workflow.scriptId - if (workflow.repository) misc_fields['repository'] = workflow.repository - if (workflow.commitId) misc_fields['commitid'] = workflow.commitId - if (workflow.revision) misc_fields['revision'] = workflow.revision - misc_fields['nxf_version'] = workflow.nextflow.version - misc_fields['nxf_build'] = workflow.nextflow.build - misc_fields['nxf_timestamp'] = workflow.nextflow.timestamp + misc_fields['start'] = workflow.start + misc_fields['complete'] = workflow.complete + misc_fields['scriptfile'] = workflow.scriptFile + misc_fields['scriptid'] = workflow.scriptId + if (workflow.repository) { + misc_fields['repository'] = workflow.repository + } + if (workflow.commitId) { + misc_fields['commitid'] = workflow.commitId + } + if (workflow.revision) { + misc_fields['revision'] = workflow.revision + } + misc_fields['nxf_version'] = workflow.nextflow.version + misc_fields['nxf_build'] = workflow.nextflow.build + misc_fields['nxf_timestamp'] = workflow.nextflow.timestamp def msg_fields = [:] msg_fields['version'] = getWorkflowVersion() @@ -433,13 +450,13 @@ def imNotification(summary_params, hook_url) { def json_message = json_template.toString() // POST - def post = new URL(hook_url).openConnection(); + def post = new URL(hook_url).openConnection() post.setRequestMethod("POST") post.setDoOutput(true) post.setRequestProperty("Content-Type", "application/json") - post.getOutputStream().write(json_message.getBytes("UTF-8")); - def postRC = post.getResponseCode(); - if (! postRC.equals(200)) { - log.warn(post.getErrorStream().getText()); + post.getOutputStream().write(json_message.getBytes("UTF-8")) + def postRC = post.getResponseCode() + if (!postRC.equals(200)) { + log.warn(post.getErrorStream().getText()) } } diff --git a/subworkflows/nf-core/utils_nfschema_plugin/main.nf b/subworkflows/nf-core/utils_nfschema_plugin/main.nf new file mode 100644 index 00000000..4994303e --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/main.nf @@ -0,0 +1,46 @@ +// +// Subworkflow that uses the nf-schema plugin to validate parameters and render the parameter summary +// + +include { paramsSummaryLog } from 'plugin/nf-schema' +include { validateParameters } from 'plugin/nf-schema' + +workflow UTILS_NFSCHEMA_PLUGIN { + + take: + input_workflow // workflow: the workflow object used by nf-schema to get metadata from the workflow + validate_params // boolean: validate the parameters + parameters_schema // string: path to the parameters JSON schema. + // this has to be the same as the schema given to `validation.parametersSchema` + // when this input is empty it will automatically use the configured schema or + // "${projectDir}/nextflow_schema.json" as default. This input should not be empty + // for meta pipelines + + main: + + // + // Print parameter summary to stdout. This will display the parameters + // that differ from the default given in the JSON schema + // + if(parameters_schema) { + log.info paramsSummaryLog(input_workflow, parameters_schema:parameters_schema) + } else { + log.info paramsSummaryLog(input_workflow) + } + + // + // Validate the parameters using nextflow_schema.json or the schema + // given via the validation.parametersSchema configuration option + // + if(validate_params) { + if(parameters_schema) { + validateParameters(parameters_schema:parameters_schema) + } else { + validateParameters() + } + } + + emit: + dummy_emit = true +} + diff --git a/subworkflows/nf-core/utils_nfschema_plugin/meta.yml b/subworkflows/nf-core/utils_nfschema_plugin/meta.yml new file mode 100644 index 00000000..f7d9f028 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/meta.yml @@ -0,0 +1,35 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json +name: "utils_nfschema_plugin" +description: Run nf-schema to validate parameters and create a summary of changed parameters +keywords: + - validation + - JSON schema + - plugin + - parameters + - summary +components: [] +input: + - input_workflow: + type: object + description: | + The workflow object of the used pipeline. + This object contains meta data used to create the params summary log + - validate_params: + type: boolean + description: Validate the parameters and error if invalid. + - parameters_schema: + type: string + description: | + Path to the parameters JSON schema. + This has to be the same as the schema given to the `validation.parametersSchema` config + option. When this input is empty it will automatically use the configured schema or + "${projectDir}/nextflow_schema.json" as default. The schema should not be given in this way + for meta pipelines. +output: + - dummy_emit: + type: boolean + description: Dummy emit to make nf-core subworkflows lint happy +authors: + - "@nvnieuwk" +maintainers: + - "@nvnieuwk" diff --git a/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test b/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test new file mode 100644 index 00000000..842dc432 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test @@ -0,0 +1,117 @@ +nextflow_workflow { + + name "Test Subworkflow UTILS_NFSCHEMA_PLUGIN" + script "../main.nf" + workflow "UTILS_NFSCHEMA_PLUGIN" + + tag "subworkflows" + tag "subworkflows_nfcore" + tag "subworkflows/utils_nfschema_plugin" + tag "plugin/nf-schema" + + config "./nextflow.config" + + test("Should run nothing") { + + when { + + params { + test_data = '' + } + + workflow { + """ + validate_params = false + input[0] = workflow + input[1] = validate_params + input[2] = "" + """ + } + } + + then { + assertAll( + { assert workflow.success } + ) + } + } + + test("Should validate params") { + + when { + + params { + test_data = '' + outdir = 1 + } + + workflow { + """ + validate_params = true + input[0] = workflow + input[1] = validate_params + input[2] = "" + """ + } + } + + then { + assertAll( + { assert workflow.failed }, + { assert workflow.stdout.any { it.contains('ERROR ~ Validation of pipeline parameters failed!') } } + ) + } + } + + test("Should run nothing - custom schema") { + + when { + + params { + test_data = '' + } + + workflow { + """ + validate_params = false + input[0] = workflow + input[1] = validate_params + input[2] = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + """ + } + } + + then { + assertAll( + { assert workflow.success } + ) + } + } + + test("Should validate params - custom schema") { + + when { + + params { + test_data = '' + outdir = 1 + } + + workflow { + """ + validate_params = true + input[0] = workflow + input[1] = validate_params + input[2] = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + """ + } + } + + then { + assertAll( + { assert workflow.failed }, + { assert workflow.stdout.any { it.contains('ERROR ~ Validation of pipeline parameters failed!') } } + ) + } + } +} diff --git a/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config new file mode 100644 index 00000000..0907ac58 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config @@ -0,0 +1,8 @@ +plugins { + id "nf-schema@2.1.0" +} + +validation { + parametersSchema = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + monochromeLogs = true +} \ No newline at end of file diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json similarity index 95% rename from subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json rename to subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json index 7626c1c9..331e0d2f 100644 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json @@ -1,10 +1,10 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/./master/nextflow_schema.json", "title": ". pipeline parameters", "description": "", "type": "object", - "definitions": { + "$defs": { "input_output_options": { "title": "Input/output options", "type": "object", @@ -87,10 +87,10 @@ }, "allOf": [ { - "$ref": "#/definitions/input_output_options" + "$ref": "#/$defs/input_output_options" }, { - "$ref": "#/definitions/generic_options" + "$ref": "#/$defs/generic_options" } ] } diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf b/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf deleted file mode 100644 index 2585b65d..00000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf +++ /dev/null @@ -1,62 +0,0 @@ -// -// Subworkflow that uses the nf-validation plugin to render help text and parameter summary -// - -/* -======================================================================================== - IMPORT NF-VALIDATION PLUGIN -======================================================================================== -*/ - -include { paramsHelp } from 'plugin/nf-validation' -include { paramsSummaryLog } from 'plugin/nf-validation' -include { validateParameters } from 'plugin/nf-validation' - -/* -======================================================================================== - SUBWORKFLOW DEFINITION -======================================================================================== -*/ - -workflow UTILS_NFVALIDATION_PLUGIN { - - take: - print_help // boolean: print help - workflow_command // string: default commmand used to run pipeline - pre_help_text // string: string to be printed before help text and summary log - post_help_text // string: string to be printed after help text and summary log - validate_params // boolean: validate parameters - schema_filename // path: JSON schema file, null to use default value - - main: - - log.debug "Using schema file: ${schema_filename}" - - // Default values for strings - pre_help_text = pre_help_text ?: '' - post_help_text = post_help_text ?: '' - workflow_command = workflow_command ?: '' - - // - // Print help message if needed - // - if (print_help) { - log.info pre_help_text + paramsHelp(workflow_command, parameters_schema: schema_filename) + post_help_text - System.exit(0) - } - - // - // Print parameter summary to stdout - // - log.info pre_help_text + paramsSummaryLog(workflow, parameters_schema: schema_filename) + post_help_text - - // - // Validate parameters relative to the parameter JSON schema - // - if (validate_params){ - validateParameters(parameters_schema: schema_filename) - } - - emit: - dummy_emit = true -} diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml b/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml deleted file mode 100644 index 3d4a6b04..00000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml +++ /dev/null @@ -1,44 +0,0 @@ -# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json -name: "UTILS_NFVALIDATION_PLUGIN" -description: Use nf-validation to initiate and validate a pipeline -keywords: - - utility - - pipeline - - initialise - - validation -components: [] -input: - - print_help: - type: boolean - description: | - Print help message and exit - - workflow_command: - type: string - description: | - The command to run the workflow e.g. "nextflow run main.nf" - - pre_help_text: - type: string - description: | - Text to print before the help message - - post_help_text: - type: string - description: | - Text to print after the help message - - validate_params: - type: boolean - description: | - Validate the parameters and error if invalid. - - schema_filename: - type: string - description: | - The filename of the schema to validate against. -output: - - dummy_emit: - type: boolean - description: | - Dummy emit to make nf-core subworkflows lint happy -authors: - - "@adamrtalbot" -maintainers: - - "@adamrtalbot" - - "@maxulysse" diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test b/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test deleted file mode 100644 index 5784a33f..00000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test +++ /dev/null @@ -1,200 +0,0 @@ -nextflow_workflow { - - name "Test Workflow UTILS_NFVALIDATION_PLUGIN" - script "../main.nf" - workflow "UTILS_NFVALIDATION_PLUGIN" - tag "subworkflows" - tag "subworkflows_nfcore" - tag "plugin/nf-validation" - tag "'plugin/nf-validation'" - tag "utils_nfvalidation_plugin" - tag "subworkflows/utils_nfvalidation_plugin" - - test("Should run nothing") { - - when { - - params { - monochrome_logs = true - test_data = '' - } - - workflow { - """ - help = false - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success } - ) - } - } - - test("Should run help") { - - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } } - ) - } - } - - test("Should run help with command") { - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = "nextflow run noorg/doesntexist" - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('nextflow run noorg/doesntexist') } }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } } - ) - } - } - - test("Should run help with extra text") { - - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = "nextflow run noorg/doesntexist" - pre_help_text = "pre-help-text" - post_help_text = "post-help-text" - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('pre-help-text') } }, - { assert workflow.stdout.any { it.contains('nextflow run noorg/doesntexist') } }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } }, - { assert workflow.stdout.any { it.contains('post-help-text') } } - ) - } - } - - test("Should validate params") { - - when { - - params { - monochrome_logs = true - test_data = '' - outdir = 1 - } - workflow { - """ - help = false - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = true - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.failed }, - { assert workflow.stdout.any { it.contains('ERROR ~ ERROR: Validation of pipeline parameters failed!') } } - ) - } - } -} diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml b/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml deleted file mode 100644 index 60b1cfff..00000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml +++ /dev/null @@ -1,2 +0,0 @@ -subworkflows/utils_nfvalidation_plugin: - - subworkflows/nf-core/utils_nfvalidation_plugin/** diff --git a/tests/README.md b/tests/README.md index 257defe3..a9af91a8 100644 --- a/tests/README.md +++ b/tests/README.md @@ -14,14 +14,21 @@ Or using [singularity](https://docs.sylabs.io/guides/3.0/user-guide/installation nextflow run plant-food-research-open/assemblyqc -r main -profile singularity,test --outdir results ``` -## Local Testing +## nf-test and Continuous Integration (CI) -The test sets included in this directory can be executed by first downloading the pipeline from GitHub and then executing the following command: +The GitHub [CI action](../.github/workflows/ci.yml) included with the pipeline continuously tests the pipeline using [nf-test](https://www.nf-test.com). Many components included with the pipeline such as [minimap2/align](../modules/nf-core/minimap2/align) include their own [tests](../modules/nf-core/minimap2/align/tests/main.nf.test) with test data from nf-core. -```bash -./main.nf -profile docker -params-file tests/minimal/params.json --outdir results -``` +## Testing with a Large Dataset at Plant&Food + +Before each release, the functionality of the entire pipeline is tested with a large dataset on the on-prem SLURM-based HPC at The New Zealand Institute for Plant and Food Research. -## Continuous Integration (CI) +## Testing Merqury Datasets -The GitHub [CI action](../.github/workflows/ci.yml) included with the pipeline continuously tests the pipeline with the various test sets listed in this directory. +Three Merqury datasets are included here which can be tested by pointing to one of the parameters file. + +```bash +./main.nf \ + -profile \ + -params-file tests/merqury//params.json \ + --outdir results +``` diff --git a/tests/hicparam/assemblysheet.csv b/tests/hicparam/assemblysheet.csv index 19e256c0..222ecb0b 100644 --- a/tests/hicparam/assemblysheet.csv +++ b/tests/hicparam/assemblysheet.csv @@ -1,2 +1,2 @@ tag,fasta -test,tests/hicparam/test_genome.fa.gz +test,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/genome/genome.fasta diff --git a/tests/hicparam/hicparam.nf b/tests/hicparam/hicparam.nf index f3080271..b15416b4 100644 --- a/tests/hicparam/hicparam.nf +++ b/tests/hicparam/hicparam.nf @@ -3,7 +3,7 @@ import groovy.json.JsonSlurper def checkHiCParam(paramValue, schema) { def jsonSlurper = new JsonSlurper() def jsonContent = jsonSlurper.parse ( file ( schema, checkIfExists: true ) ) - def pattern = jsonContent.definitions.hic_options.properties.hic.pattern + def pattern = jsonContent['$defs'].hic_options.properties.hic.pattern def match = paramValue ==~ pattern return match diff --git a/tests/hicparam/main.nf.test b/tests/hicparam/main.nf.test new file mode 100644 index 00000000..064dc0c6 --- /dev/null +++ b/tests/hicparam/main.nf.test @@ -0,0 +1,35 @@ +nextflow_pipeline { + + name "Test with hic param" + script "main.nf" + + test("hic param - stub") { + + options '-stub' + + when { + params { + input = "$baseDir/tests/hicparam/assemblysheet.csv" + hic = "$baseDir/tests/hicparam/hic/Dummy_hic_{1,2}.merged.fq.gz" + outdir = "$outputDir" + } + } + + then { + def stable_path = getAllFilesFromDir(params.outdir, false, ['pipeline_info/*.{html,json,txt,yml}', 'report.{html,json}'], null, ['**']) + + assertAll( + { assert workflow.success}, + { assert snapshot( + [ + 'successful tasks': workflow.trace.succeeded().size(), + 'versions': removeNextflowVersion("$outputDir/pipeline_info/software_versions.yml"), + 'stable paths': stable_path + ] + ).match() } + ) + } + + } + +} diff --git a/tests/hicparam/main.nf.test.snap b/tests/hicparam/main.nf.test.snap new file mode 100644 index 00000000..db89beee --- /dev/null +++ b/tests/hicparam/main.nf.test.snap @@ -0,0 +1,102 @@ +{ + "hic param - stub": { + "content": [ + { + "successful tasks": 21, + "versions": { + "AGP2ASSEMBLY": { + "juicebox_scripts": "0.1.0" + }, + "ASSEMBLATHON_STATS": { + "assemblathon_stats": "github/PlantandFoodResearch/assemblathon2-analysis/a93cba2" + }, + "ASSEMBLY2BEDPE": { + "python": "3.11.3", + "pandas": "2.1.1" + }, + "BWA_INDEX": { + "bwa": "0.7.18-r1243-dirty" + }, + "BWA_MEM": { + "bwa": "0.7.18-r1243-dirty", + "samtools": 1.2 + }, + "FASTAVALIDATOR": { + "py_fasta_validator": 0.6 + }, + "FASTP": { + "fastp": "0.23.4" + }, + "FASTQC_RAW": { + "fastqc": "0.12.1" + }, + "FASTQC_TRIM": { + "fastqc": "0.12.1" + }, + "HIC2HTML": { + "python": "3.11.3" + }, + "HICQC": { + "hic_qc.py": "0+untagged.261.g6881c33" + }, + "JUICER_SORT": { + "sort": 8.3 + }, + "MAKEAGPFROMFASTA": { + "juicebox_scripts": "0.1.0" + }, + "MATLOCK_BAM2_JUICER": { + "matlock": 20181227 + }, + "RUNASSEMBLYVISUALIZER": { + "run-assembly-visualizer.sh": "18 July 2016" + }, + "SAMBLASTER": { + "samblaster": "0.1.26", + "samtools": "1.19.2" + }, + "SAMTOOLS_FAIDX": { + "samtools": 1.21 + }, + "SEQKIT_RMDUP": { + "seqkit": "v2.8.0" + }, + "SEQKIT_SORT": { + "seqkit": "v2.8.0" + }, + "TAG_ASSEMBLY": { + "pigz": "2.3.4" + }, + "Workflow": { + "plant-food-research-open/assemblyqc": "v2.2.0" + } + }, + "stable paths": [ + "test_stats.csv:md5,d41d8cd98f00b204e9800998ecf8427e", + "test.agp.assembly:md5,d41d8cd98f00b204e9800998ecf8427e", + "test.assembly.bedpe:md5,d41d8cd98f00b204e9800998ecf8427e", + "Dummy_hic.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e", + "Dummy_hic.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e", + "Dummy_hic.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e", + "Dummy_hic.paired.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "Dummy_hic_1.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "Dummy_hic_1.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "Dummy_hic_2.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "Dummy_hic_2.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "Dummy_hic.html:md5,d41d8cd98f00b204e9800998ecf8427e", + "Dummy_hic.zip:md5,d41d8cd98f00b204e9800998ecf8427e", + "Dummy_hic.html:md5,d41d8cd98f00b204e9800998ecf8427e", + "Dummy_hic.zip:md5,d41d8cd98f00b204e9800998ecf8427e", + "Dummy_hic.on.test.pdf:md5,d41d8cd98f00b204e9800998ecf8427e", + "test.hic:md5,d41d8cd98f00b204e9800998ecf8427e", + "test.html:md5,bbd8f07f11522eb75bb9429e86e95713" + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-10T16:11:00.660108" + } +} diff --git a/tests/hicparam/params.json b/tests/hicparam/params.json deleted file mode 100644 index 2f0c2d51..00000000 --- a/tests/hicparam/params.json +++ /dev/null @@ -1,9 +0,0 @@ -{ - "config_profile_name": "Test to verify hic param validation", - "config_profile_description": "Test to verify hic param validation", - "input": "tests/hicparam/assemblysheet.csv", - "hic": "tests/hicparam/hic/Dummy_hic_{1,2}.merged.fq.gz", - "max_cpus": 2, - "max_memory": "6.GB", - "max_time": "6.h" -} diff --git a/tests/hicparam/test_genome.fa.gz b/tests/hicparam/test_genome.fa.gz deleted file mode 100644 index 1a720c1f..00000000 Binary files a/tests/hicparam/test_genome.fa.gz and /dev/null differ diff --git a/tests/invalid/assemblysheet.csv b/tests/invalid/assemblysheet.csv index 93e81420..bfee9e8d 100644 --- a/tests/invalid/assemblysheet.csv +++ b/tests/invalid/assemblysheet.csv @@ -1,5 +1,5 @@ tag,fasta,gff3 FI1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/003/814/445/GCA_003814445.1_ASM381444v1/GCA_003814445.1_ASM381444v1_genomic.fna.gz,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/003/814/445/GCA_003814445.1_ASM381444v1/GCA_003814445.1_ASM381444v1_genomic.gff.gz TT_2021a,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/021/950/295/GCA_021950295.1_ASM2195029v1/GCA_021950295.1_ASM2195029v1_genomic.fna.gz,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/003/814/445/GCA_003814445.1_ASM381444v1/GCA_003814445.1_ASM381444v1_genomic.gff.gz -MISC,tests/invalid/invalid.fsa.gz -DUPSEQ,tests/invalid/dupseq.fsa.gz +MISC,https://raw.githubusercontent.com/plant-food-research-open/assemblyqc/dev/tests/invalid/invalid.fsa.gz +DUPSEQ,https://raw.githubusercontent.com/plant-food-research-open/assemblyqc/dev/tests/invalid/dupseq.fsa.gz diff --git a/tests/invalid/main.nf.test b/tests/invalid/main.nf.test new file mode 100644 index 00000000..93cb3310 --- /dev/null +++ b/tests/invalid/main.nf.test @@ -0,0 +1,35 @@ +nextflow_pipeline { + + name "Test with invalid input files" + script "main.nf" + + test("invalid") { + + when { + params { + input = "$baseDir/tests/invalid/assemblysheet.csv" + outdir = "$outputDir" + } + } + + then { + def stable_path = getAllFilesFromDir(params.outdir, false, ['pipeline_info/*.{html,json,txt,yml}', 'report.{html,json}'], null, ['**']) + + assertAll( + { assert workflow.success}, + { assert 'WARN: GFF3 validation failed for TT_2021a' in workflow.stdout }, + { assert 'WARN: FASTA validation failed for MISC' in workflow.stdout }, + { assert 'WARN: FASTA validation failed for DUPSEQ due to presence of duplicate sequences' in workflow.stdout }, + { assert snapshot( + [ + 'successful tasks': workflow.trace.succeeded().size(), + 'versions': removeNextflowVersion("$outputDir/pipeline_info/software_versions.yml"), + 'stable paths': stable_path + ] + ).match() } + ) + } + + } + +} diff --git a/tests/invalid/main.nf.test.snap b/tests/invalid/main.nf.test.snap new file mode 100644 index 00000000..c29b15ad --- /dev/null +++ b/tests/invalid/main.nf.test.snap @@ -0,0 +1,54 @@ +{ + "invalid": { + "content": [ + { + "successful tasks": 25, + "versions": { + "ASSEMBLATHON_STATS": { + "assemblathon_stats": "github/PlantandFoodResearch/assemblathon2-analysis/a93cba2" + }, + "FASTAVALIDATOR": { + "py_fasta_validator": 0.6 + }, + "GT_GFF3": { + "genometools": "1.6.5" + }, + "GT_GFF3VALIDATOR": { + "genometools": "1.6.5" + }, + "GT_STAT": { + "genometools": "1.6.5" + }, + "GUNZIP_FASTA": { + "gunzip": 1.1 + }, + "GUNZIP_GFF3": { + "gunzip": 1.1 + }, + "SAMTOOLS_FAIDX": { + "samtools": 1.21 + }, + "SEQKIT_RMDUP": { + "seqkit": "v2.8.0" + }, + "TAG_ASSEMBLY": { + "pigz": "2.3.4" + }, + "Workflow": { + "plant-food-research-open/assemblyqc": "v2.2.0" + } + }, + "stable paths": [ + "FI1_stats.csv:md5,8d1274e52117e39b413ff71a76c40331", + "TT_2021a_stats.csv:md5,55fa0923a6d47fdd19201486848b48fe", + "FI1.gt.stat.yml:md5,2fc1b9c84af0c2323d78a9ac623a3022" + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-10T15:59:12.150838" + } +} diff --git a/tests/invalid/params.json b/tests/invalid/params.json deleted file mode 100644 index dd017c16..00000000 --- a/tests/invalid/params.json +++ /dev/null @@ -1,8 +0,0 @@ -{ - "config_profile_name": "Invalid profile", - "config_profile_description": "Profile to test invalid files", - "input": "tests/invalid/assemblysheet.csv", - "max_cpus": 2, - "max_memory": "6.GB", - "max_time": "6.h" -} diff --git a/tests/merqury/mixed2x/params.json b/tests/merqury/mixed2x/params.json index 6ce89d4b..aa96496d 100644 --- a/tests/merqury/mixed2x/params.json +++ b/tests/merqury/mixed2x/params.json @@ -3,8 +3,5 @@ "config_profile_description": "Merqury test for a mixed diploid assembly contained in a single fasta", "input": "tests/merqury/mixed2x/assemblysheet.csv", "merqury_skip": false, - "merqury_kmer_length": 21, - "max_cpus": 8, - "max_memory": "32.GB", - "max_time": "6.h" + "merqury_kmer_length": 21 } diff --git a/tests/merqury/phased2x.mp/assemblysheet.csv.local b/tests/merqury/phased2x.mp/assemblysheet.csv.local deleted file mode 100644 index f90aec6f..00000000 --- a/tests/merqury/phased2x.mp/assemblysheet.csv.local +++ /dev/null @@ -1,4 +0,0 @@ -tag,fasta,reads_1,reads_2,maternal_reads_1,paternal_reads_1 -COL,https://gembox.cbcb.umd.edu/triobinning/athal_COL.fasta,/Users/hrauxr/Projects/test-data/assemblyqc/merqury.fk/phased2x.mp/SRR3703081_1.fastq.gz,/Users/hrauxr/Projects/test-data/assemblyqc/merqury.fk/phased2x.mp/SRR3703081_2.fastq.gz,/Users/hrauxr/Projects/test-data/assemblyqc/merqury.fk/phased2x.mp/athal_CVI.fastq.gz,/Users/hrauxr/Projects/test-data/assemblyqc/merqury.fk/phased2x.mp/athal_COL.fastq.gz -CVI,https://gembox.cbcb.umd.edu/triobinning/athal_CVI.fasta,/Users/hrauxr/Projects/test-data/assemblyqc/merqury.fk/phased2x.mp/SRR3703081_1.fastq.gz,/Users/hrauxr/Projects/test-data/assemblyqc/merqury.fk/phased2x.mp/SRR3703081_2.fastq.gz,/Users/hrauxr/Projects/test-data/assemblyqc/merqury.fk/phased2x.mp/athal_CVI.fastq.gz,/Users/hrauxr/Projects/test-data/assemblyqc/merqury.fk/phased2x.mp/athal_COL.fastq.gz -FI1,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/003/814/445/GCA_003814445.1_ASM381444v1/GCA_003814445.1_ASM381444v1_genomic.fna.gz,/Users/hrauxr/Projects/test-data/assemblyqc/merqury.fk/phased2x.mp/SRR8238189_1.fastq.gz,/Users/hrauxr/Projects/test-data/assemblyqc/merqury.fk/phased2x.mp/SRR8238189_2.fastq.gz diff --git a/tests/merqury/phased2x.mp/params.json b/tests/merqury/phased2x.mp/params.json index 60f157ca..0b4479f0 100644 --- a/tests/merqury/phased2x.mp/params.json +++ b/tests/merqury/phased2x.mp/params.json @@ -3,8 +3,5 @@ "config_profile_description": "Merqury test for a phased diploid assembly contained in separate fasta with parental reads", "input": "tests/merqury/phased2x.mp/assemblysheet.csv", "merqury_skip": false, - "merqury_kmer_length": 21, - "max_cpus": 8, - "max_memory": "32.GB", - "max_time": "6.h" + "merqury_kmer_length": 21 } diff --git a/tests/merqury/phased2x/params.json b/tests/merqury/phased2x/params.json index 0b363fab..e37db9ab 100644 --- a/tests/merqury/phased2x/params.json +++ b/tests/merqury/phased2x/params.json @@ -3,8 +3,5 @@ "config_profile_description": "Merqury test for a phased diploid assembly contained in separate fasta", "input": "tests/merqury/phased2x/assemblysheet.csv", "merqury_skip": false, - "merqury_kmer_length": 21, - "max_cpus": 8, - "max_memory": "32.GB", - "max_time": "6.h" + "merqury_kmer_length": 21 } diff --git a/tests/minimal/main.nf.test b/tests/minimal/main.nf.test new file mode 100644 index 00000000..b4dae9be --- /dev/null +++ b/tests/minimal/main.nf.test @@ -0,0 +1,32 @@ +nextflow_pipeline { + + name "Test with minimal input" + script "main.nf" + + test("minimal") { + + when { + params { + input = "$baseDir/assets/assemblysheetv2.csv" + outdir = "$outputDir" + } + } + + then { + def stable_path = getAllFilesFromDir(params.outdir, false, ['pipeline_info/*.{html,json,txt,yml}', 'report.{html,json}'], null, ['**']) + + assertAll( + { assert workflow.success}, + { assert snapshot( + [ + 'successful tasks': workflow.trace.succeeded().size(), + 'versions': removeNextflowVersion("$outputDir/pipeline_info/software_versions.yml"), + 'stable paths': stable_path + ] + ).match() } + ) + } + + } + +} diff --git a/tests/minimal/main.nf.test.snap b/tests/minimal/main.nf.test.snap new file mode 100644 index 00000000..8e139a7e --- /dev/null +++ b/tests/minimal/main.nf.test.snap @@ -0,0 +1,53 @@ +{ + "minimal": { + "content": [ + { + "successful tasks": 11, + "versions": { + "ASSEMBLATHON_STATS": { + "assemblathon_stats": "github/PlantandFoodResearch/assemblathon2-analysis/a93cba2" + }, + "FASTAVALIDATOR": { + "py_fasta_validator": 0.6 + }, + "GT_GFF3": { + "genometools": "1.6.5" + }, + "GT_GFF3VALIDATOR": { + "genometools": "1.6.5" + }, + "GT_STAT": { + "genometools": "1.6.5" + }, + "GUNZIP_FASTA": { + "gunzip": 1.1 + }, + "GUNZIP_GFF3": { + "gunzip": 1.1 + }, + "SAMTOOLS_FAIDX": { + "samtools": 1.21 + }, + "SEQKIT_RMDUP": { + "seqkit": "v2.8.0" + }, + "TAG_ASSEMBLY": { + "pigz": "2.3.4" + }, + "Workflow": { + "plant-food-research-open/assemblyqc": "v2.2.0" + } + }, + "stable paths": [ + "FI1_stats.csv:md5,8d1274e52117e39b413ff71a76c40331", + "FI1.gt.stat.yml:md5,2fc1b9c84af0c2323d78a9ac623a3022" + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-10T14:37:54.336611" + } +} diff --git a/tests/minimal/params.json b/tests/minimal/params.json deleted file mode 100644 index c5683622..00000000 --- a/tests/minimal/params.json +++ /dev/null @@ -1,8 +0,0 @@ -{ - "config_profile_name": "Test profile", - "config_profile_description": "Minimal test dataset to check pipeline function", - "input": "https://raw.githubusercontent.com/plant-food-research-open/assemblyqc/dev/assets/assemblysheetv2.csv", - "max_cpus": 2, - "max_memory": "6.GB", - "max_time": "6.h" -} diff --git a/tests/nextflow.config b/tests/nextflow.config new file mode 100644 index 00000000..ed1a8053 --- /dev/null +++ b/tests/nextflow.config @@ -0,0 +1,22 @@ +/* +======================================================================================== + Nextflow config file for running tests +======================================================================================== +*/ + +params { + modules_testdata_base_path = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/' +} + +timeline { enabled = false } +report { enabled = false } +trace { enabled = false } +dag { enabled = false } + +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} diff --git a/tests/noltr/main.nf.test b/tests/noltr/main.nf.test new file mode 100644 index 00000000..5d20a1ff --- /dev/null +++ b/tests/noltr/main.nf.test @@ -0,0 +1,61 @@ +nextflow_pipeline { + + name "Test with a genome which does not have LTRs" + script "main.nf" + + test("noltr") { + + when { + params { + input = "$baseDir/tests/noltr/assemblysheet.csv" + lai_skip = false + outdir = "$outputDir" + } + } + + then { + def stable_path = getAllFilesFromDir( + params.outdir, + false, + [ + 'pipeline_info/*.{html,json,txt,yml}', + 'report.{html,json}', + 'lai/*.LAI.log', + 'lai/*.restored.ids.gff3', + 'lai/*.LAI.out', + 'lai/*.LTRlib.fa' + ], + null, + ['**'] + ) + + def stable_name = getAllFilesFromDir( + params.outdir, + true, + [ + 'pipeline_info/*.{html,json,txt,yml}' + ], + null, + ['**'] + ) + + assertAll( + { assert workflow.success}, + { + def lai = Float.parseFloat(path("$outputDir/lai/FI1.LAI.out").text.split("\n")[1].split("\t")[6]) + assert Math.abs(lai - 4.84) <= 1.0 + }, + { assert snapshot( + [ + 'successful tasks': workflow.trace.succeeded().size(), + 'versions': removeNextflowVersion("$outputDir/pipeline_info/software_versions.yml"), + 'stable paths': stable_path, + 'stable names': getRelativePath(stable_name, outputDir), + ] + ).match() } + ) + } + + } + +} diff --git a/tests/noltr/main.nf.test.snap b/tests/noltr/main.nf.test.snap new file mode 100644 index 00000000..d1580b9d --- /dev/null +++ b/tests/noltr/main.nf.test.snap @@ -0,0 +1,87 @@ +{ + "noltr": { + "content": [ + { + "successful tasks": 26, + "versions": { + "ASSEMBLATHON_STATS": { + "assemblathon_stats": "github/PlantandFoodResearch/assemblathon2-analysis/a93cba2" + }, + "CAT_CAT": { + "pigz": "2.3.4" + }, + "CUSTOM_RESTOREGFFIDS": { + "python": "3.10.2" + }, + "CUSTOM_SHORTENFASTAIDS": { + "python": "3.8.13", + "biopython": 1.75 + }, + "FASTAVALIDATOR": { + "py_fasta_validator": 0.6 + }, + "GUNZIP_FASTA": { + "gunzip": 1.1 + }, + "LTRFINDER": { + "LTR_FINDER_parallel": "v1.1", + "ltr_finder": "v1.07" + }, + "LTRHARVEST": { + "LTR_HARVEST_parallel": "v1.1", + "genometools": "1.6.5" + }, + "LTRRETRIEVER_LAI": { + "lai": "beta3.2" + }, + "LTRRETRIEVER_LTRRETRIEVER": { + "LTR_retriever": "v2.9.9" + }, + "SAMTOOLS_FAIDX": { + "samtools": 1.21 + }, + "SEQKIT_RMDUP": { + "seqkit": "v2.8.0" + }, + "TAG_ASSEMBLY": { + "pigz": "2.3.4" + }, + "UNMASK_IF_ANY": { + "seqkit": "v2.8.0" + }, + "Workflow": { + "plant-food-research-open/assemblyqc": "v2.2.0" + } + }, + "stable paths": [ + "FI1_stats.csv:md5,8d1274e52117e39b413ff71a76c40331", + "sarscov2_stats.csv:md5,9ec4147b73cae550a38337520fc028dd", + "FI1.short.ids.tsv:md5,948c19c1ccd05463a1f5d23f0615b169", + "sarscov2.short.ids.tsv:md5,d7a2af88e8549586e5616bff6a88bd71" + ], + "stable names": [ + "assemblathon_stats", + "assemblathon_stats/FI1_stats.csv", + "assemblathon_stats/sarscov2_stats.csv", + "lai", + "lai/FI1.LAI.log", + "lai/FI1.LAI.out", + "lai/FI1.LTRlib.fa", + "lai/FI1.restored.ids.gff3", + "lai/FI1.short.ids.tsv", + "lai/sarscov2.short.ids.tsv", + "pipeline_info", + "report.html", + "report.json", + "synteny", + "synteny/plotsr" + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-10T20:38:08.028244" + } +} diff --git a/tests/noltr/params.json b/tests/noltr/params.json deleted file mode 100644 index a46bd34d..00000000 --- a/tests/noltr/params.json +++ /dev/null @@ -1,9 +0,0 @@ -{ - "config_profile_name": "No LTRs assembly profile", - "config_profile_description": "Profile to test an assembly without LTRs", - "input": "tests/noltr/assemblysheet.csv", - "lai_skip": false, - "max_cpus": 2, - "max_memory": "6.GB", - "max_time": "6.h" -} diff --git a/tests/orthofinder/assemblysheet.csv b/tests/orthofinder/assemblysheet.csv new file mode 100644 index 00000000..64892681 --- /dev/null +++ b/tests/orthofinder/assemblysheet.csv @@ -0,0 +1,5 @@ +tag,fasta,gff3 +agalactiae,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/063/605/GCF_000063605.1_ASM6360v1/GCF_000063605.1_ASM6360v1_genomic.fna.gz,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/063/605/GCF_000063605.1_ASM6360v1/GCF_000063605.1_ASM6360v1_genomic.gff.gz +gallisepticum,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/900/476/085/GCF_900476085.1_50569_G01/GCF_900476085.1_50569_G01_genomic.fna.gz,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/900/476/085/GCF_900476085.1_50569_G01/GCF_900476085.1_50569_G01_genomic.gff.gz +genitalium,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/027/325/GCF_000027325.1_ASM2732v1/GCF_000027325.1_ASM2732v1_genomic.fna.gz,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/027/325/GCF_000027325.1_ASM2732v1/GCF_000027325.1_ASM2732v1_genomic.gff.gz +hyopneumoniae,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/008/205/GCF_000008205.1_ASM820v1/GCF_000008205.1_ASM820v1_genomic.fna.gz,https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/008/205/GCF_000008205.1_ASM820v1/GCF_000008205.1_ASM820v1_genomic.gff.gz diff --git a/tests/orthofinder/main.nf.test b/tests/orthofinder/main.nf.test new file mode 100644 index 00000000..8e6bfab7 --- /dev/null +++ b/tests/orthofinder/main.nf.test @@ -0,0 +1,36 @@ +nextflow_pipeline { + + name "Test orthofinder" + script "main.nf" + + test("invalid") { + + when { + params { + input = "$baseDir/tests/orthofinder/assemblysheet.csv" + orthofinder_skip = false + outdir = "$outputDir" + } + } + + then { + def stable_path = getAllFilesFromDir(params.outdir, false, ['pipeline_info/*.{html,json,txt,yml}', 'report.{html,json}', 'orthofinder/**'], null, ['**']) + def report_json = (Map) new groovy.json.JsonSlurper().parseText(file("$outputDir/report.json").text) + def orthofinder_stats = report_json['ORTHOFINDER']['num_species_orthogroup'] + + assertAll( + { assert workflow.success}, + { assert snapshot( + [ + 'successful tasks': workflow.trace.succeeded().size(), + 'versions': removeNextflowVersion("$outputDir/pipeline_info/software_versions.yml"), + 'stable paths': stable_path, + 'orthofinder stats': orthofinder_stats + ] + ).match() } + ) + } + + } + +} diff --git a/tests/orthofinder/main.nf.test.snap b/tests/orthofinder/main.nf.test.snap new file mode 100644 index 00000000..4d34b485 --- /dev/null +++ b/tests/orthofinder/main.nf.test.snap @@ -0,0 +1,83 @@ +{ + "invalid": { + "content": [ + { + "successful tasks": 46, + "versions": { + "ASSEMBLATHON_STATS": { + "assemblathon_stats": "github/PlantandFoodResearch/assemblathon2-analysis/a93cba2" + }, + "FASTAVALIDATOR": { + "py_fasta_validator": 0.6 + }, + "GFFREAD": { + "gffread": "0.12.7" + }, + "GT_GFF3": { + "genometools": "1.6.5" + }, + "GT_GFF3VALIDATOR": { + "genometools": "1.6.5" + }, + "GT_STAT": { + "genometools": "1.6.5" + }, + "GUNZIP_FASTA": { + "gunzip": 1.1 + }, + "GUNZIP_GFF3": { + "gunzip": 1.1 + }, + "ORTHOFINDER": { + "orthofinder": "2.5.5" + }, + "SAMTOOLS_FAIDX": { + "samtools": 1.21 + }, + "SEQKIT_RMDUP": { + "seqkit": "v2.8.0" + }, + "TAG_ASSEMBLY": { + "pigz": "2.3.4" + }, + "Workflow": { + "plant-food-research-open/assemblyqc": "v2.2.0" + } + }, + "stable paths": [ + "agalactiae_stats.csv:md5,4f4ce28e8975f9ded73cf86ff5eaa507", + "gallisepticum_stats.csv:md5,cedd23a5778c76bf17053f5a0aa6eaf8", + "genitalium_stats.csv:md5,04ef67d681fba6ca05df4171f888424a", + "hyopneumoniae_stats.csv:md5,d59081ff1b5bb0e9f1be494b8463f3b8", + "agalactiae.gt.stat.yml:md5,74fe2e9753fdebaa31840016201eaf17", + "gallisepticum.gt.stat.yml:md5,0ec946739663876b25fca0efd47fd467", + "genitalium.gt.stat.yml:md5,ce9917e5b197e130c0d1be90d88e3177", + "hyopneumoniae.gt.stat.yml:md5,856c5eb7511f53a40056987922e649c7" + ], + "orthofinder stats": [ + { + "Number of species in orthogroup": 1, + "Number of orthogroups": 79 + }, + { + "Number of species in orthogroup": 2, + "Number of orthogroups": 166 + }, + { + "Number of species in orthogroup": 3, + "Number of orthogroups": 70 + }, + { + "Number of species in orthogroup": 4, + "Number of orthogroups": 277 + } + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-11-01T14:11:21.865104" + } +} diff --git a/tests/stub/main.nf.test b/tests/stub/main.nf.test new file mode 100644 index 00000000..528ec8b5 --- /dev/null +++ b/tests/stub/main.nf.test @@ -0,0 +1,55 @@ +nextflow_pipeline { + + name "Test the entire pipeline in stub mode" + script "main.nf" + + test("full - stub") { + options '-stub' + + when { + params { + input = "$baseDir/assets/assemblysheetv2.csv" + gfastats_skip = false + ncbi_fcs_adaptor_skip = false + ncbi_fcs_adaptor_empire = "euk" + ncbi_fcs_gx_skip = false + ncbi_fcs_gx_tax_id = 12 + ncbi_fcs_gx_db_path = "$baseDir/tests/stub/gxdb/test" + busco_skip = false + busco_mode = "genome" + busco_lineage_datasets = "fungi_odb10 hypocreales_odb10" + tidk_skip = false + tidk_repeat_seq = "TTTGGG" + lai_skip = false + kraken2_skip = false + kraken2_db_path = "$baseDir/tests/stub/kraken2/k2_minusb_20231009.tar.gz" + hic = "$baseDir/tests/stub/hic/Dummy_hic.R{1,2}.fq.gz" + hic_skip_fastp = true + hic_skip_fastqc = false + synteny_skip = false + synteny_mummer_skip = false + synteny_plotsr_skip = false + synteny_xref_assemblies = "$baseDir/assets/xrefsheet.csv" + merqury_skip = false + outdir = "$outputDir" + } + } + + then { + def stable_path = getAllFilesFromDir(params.outdir, false, ['pipeline_info/*.{html,json,txt,yml}', 'report.{html,json}'], null, ['**']) + + assertAll( + { assert workflow.success}, + { assert snapshot( + [ + 'successful tasks': workflow.trace.succeeded().size(), + 'versions': removeNextflowVersion("$outputDir/pipeline_info/software_versions.yml"), + 'stable paths': stable_path + ] + ).match() } + ) + } + + } + +} diff --git a/tests/stub/main.nf.test.snap b/tests/stub/main.nf.test.snap new file mode 100644 index 00000000..d2e05d4c --- /dev/null +++ b/tests/stub/main.nf.test.snap @@ -0,0 +1,307 @@ +{ + "full - stub": { + "content": [ + { + "successful tasks": 91, + "versions": { + "AGP2ASSEMBLY": { + "juicebox_scripts": "0.1.0" + }, + "ASSEMBLATHON_STATS": { + "assemblathon_stats": "github/PlantandFoodResearch/assemblathon2-analysis/a93cba2" + }, + "ASSEMBLY2BEDPE": { + "python": "3.11.3", + "pandas": "2.1.1" + }, + "BUNDLELINKS": { + "python": "3.11.3" + }, + "BUSCO_ANNOTATION": { + "busco": "5.7.1" + }, + "BUSCO_ASSEMBLY": { + "busco": "5.7.1" + }, + "BWA_INDEX": { + "bwa": "0.7.18-r1243-dirty" + }, + "BWA_MEM": { + "bwa": "0.7.18-r1243-dirty", + "samtools": 1.2 + }, + "CAT_CAT": { + "pigz": "2.3.4" + }, + "CIRCOS": { + "circos": "v0.69-8", + "perl": 5.032001 + }, + "COLOURBUNDLELINKS": { + "python": "3.11.3", + "perl": "5.32.1" + }, + "CUSTOM_RELABELFASTA": { + "python": "3.8.13", + "biopython": 1.75 + }, + "CUSTOM_SHORTENFASTAIDS": { + "python": "3.8.13", + "biopython": 1.75 + }, + "CUSTOM_SRATOOLSNCBISETTINGS": { + "sratools": "3.0.8" + }, + "DNADIFF": { + "dnadiff": 1.3 + }, + "EXTRACT_PROTEINS": { + "gffread": "0.12.7" + }, + "FASTAVALIDATOR": { + "py_fasta_validator": 0.6 + }, + "FASTQC_RAW": { + "fastqc": "0.12.1" + }, + "FCS_FCSADAPTOR": { + "FCS-adaptor": "0.5.0" + }, + "FILTERSORTFASTA": { + "samtools": "1.16.1" + }, + "FILTER_BY_LENGTH": { + "seqkit": "v2.8.0" + }, + "GENERATEKARYOTYPE": { + "awk": "1.3.4 20200120", + "grep": "(GNU grep) 3.4", + "sed": "(GNU sed) 4.7" + }, + "GETFASTALENGTH": { + "samtools": "1.16.1" + }, + "GFASTATS": { + "gfastats": "1.3.6" + }, + "GT_GFF3": { + "genometools": "1.6.5" + }, + "GT_GFF3VALIDATOR": { + "genometools": "1.6.5" + }, + "GT_STAT": { + "genometools": "1.6.5" + }, + "GUNZIP_FASTA": { + "gunzip": 1.1 + }, + "GUNZIP_GFF3": { + "gunzip": 1.1 + }, + "HIC2HTML": { + "python": "3.11.3" + }, + "HICQC": { + "hic_qc.py": "0+untagged.261.g6881c33" + }, + "JUICER_SORT": { + "sort": 8.3 + }, + "KRAKEN2": { + "kraken2": "2.1.2" + }, + "KRAKEN2_KRONA_PLOT": { + "KronaTools": "2.7.1" + }, + "LINEARSYNTENY": { + "python": "3.11.3", + "pandas": "2.1.1", + "plotly": "5.20.0" + }, + "LTRFINDER": { + "LTR_FINDER_parallel": "v1.1", + "ltr_finder": "v1.07" + }, + "LTRHARVEST": { + "LTR_HARVEST_parallel": "v1.1", + "genometools": "1.6.5" + }, + "LTRRETRIEVER_LAI": { + "lai": "beta3.2" + }, + "LTRRETRIEVER_LTRRETRIEVER": { + "LTR_retriever": "v2.9.9" + }, + "MAKEAGPFROMFASTA": { + "juicebox_scripts": "0.1.0" + }, + "MATLOCK_BAM2_JUICER": { + "matlock": 20181227 + }, + "MERQURY_MERQURY": { + "merqury": 1.3 + }, + "MERYL_COUNT": { + "meryl": "1.4.1" + }, + "MERYL_UNIONSUM": { + "meryl": "1.4.1" + }, + "MINIMAP2_ALIGN": { + "minimap2": "2.28-r1209" + }, + "MUMMER": { + "nucmer": "4.0.0rc1" + }, + "NCBI_FCS_GX_KRONA_PLOT": { + "KronaTools": "2.7.1" + }, + "NCBI_FCS_GX_SCREEN_SAMPLES": { + "fcs_gx": "0.5.4" + }, + "NCBI_FCS_GX_SETUP_SAMPLE": { + "ubuntu": "20.04.6l" + }, + "PLOTSR": { + "plotsr": "1.1.1" + }, + "RELABELBUNDLELINKS": { + "python": "3.11.3" + }, + "RELABELFASTALENGTH": { + "python": "3.11.3" + }, + "RUNASSEMBLYVISUALIZER": { + "run-assembly-visualizer.sh": "18 July 2016" + }, + "SAMBLASTER": { + "samblaster": "0.1.26", + "samtools": "1.19.2" + }, + "SAMTOOLS_FAIDX": { + "samtools": 1.21 + }, + "SEQKIT_RMDUP": { + "seqkit": "v2.8.0" + }, + "SEQKIT_SORT": { + "seqkit": "v2.8.0" + }, + "SORT_BY_LENGTH": { + "seqkit": "v2.8.0" + }, + "SPLITBUNDLEFILE": { + "awk": "1.3.4 20200120" + }, + "SRATOOLS_FASTERQDUMP": { + "sratools": "3.0.8", + "pigz": 2.6 + }, + "SRATOOLS_PREFETCH": { + "sratools": "3.1.0", + "curl": "8.5.0" + }, + "SYRI": { + "syri": "1.7.0" + }, + "TAG_ASSEMBLY": { + "pigz": "2.3.4" + }, + "TIDK_EXPLORE": { + "tidk": "0.2.41" + }, + "TIDK_PLOT_APOSTERIORI": { + "tidk": "0.2.41" + }, + "TIDK_PLOT_APRIORI": { + "tidk": "0.2.41" + }, + "TIDK_SEARCH_APOSTERIORI": { + "tidk": "0.2.41" + }, + "TIDK_SEARCH_APRIORI": { + "tidk": "0.2.41" + }, + "UNMASK_IF_ANY": { + "seqkit": "v2.8.0" + }, + "UNTAR": { + "untar": 1.34 + }, + "Workflow": { + "plant-food-research-open/assemblyqc": "v2.2.0" + } + }, + "stable paths": [ + "FI1_stats.csv:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.gt.stat.yml:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.assembly_summary:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.hic:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.html:md5,6bd2e97a3a2199609c2f2a5d4d314b1e", + "FI1.agp.assembly:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.assembly.bedpe:md5,d41d8cd98f00b204e9800998ecf8427e", + "Dummy_hic.R.html:md5,d41d8cd98f00b204e9800998ecf8427e", + "Dummy_hic.R.zip:md5,d41d8cd98f00b204e9800998ecf8427e", + "Dummy_hic.R.on.FI1.pdf:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.kraken2.cut:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.kraken2.krona.cut:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.kraken2.krona.html:md5,a629f1aab2e8dce3e921119498d669fd", + "FI1.kraken2.report:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.LAI.log:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.LAI.out:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.LTRlib.fa:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.short.ids.tsv:md5,fcf920d9a7b57a1e3c29a9e88673330f", + "FI1.FI1.qv:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.completeness.stats:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.dist_only.hist:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.hist.ploidy:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.qv:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.spectra-asm.fl.png:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.spectra-asm.hist:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.spectra-asm.ln.png:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.spectra-asm.st.png:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.spectra-cn.fl.png:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.spectra-cn.hist:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.spectra-cn.ln.png:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.spectra-cn.st.png:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1_only.bed:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1_only.wig:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.fcs_adaptor_report.txt:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.fcs.gx.krona.cut:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.fcs.gx.krona.html:md5,4e24c1bdb5d3e2666e647390a9ececce", + "FI1.fcs_gx_report.txt:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.inter.tax.rpt.tsv:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.taxonomy.rpt:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.on.JAD.all.html:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.on.JAD.all.png:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.on.JAD.all.svg:md5,d41d8cd98f00b204e9800998ecf8427e", + "bundled.links.tsv:md5,d41d8cd98f00b204e9800998ecf8427e", + "circos.conf:md5,d41d8cd98f00b204e9800998ecf8427e", + "karyotype.tsv:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.on.TT_2021a.all.html:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.on.TT_2021a.all.png:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.on.TT_2021a.all.svg:md5,d41d8cd98f00b204e9800998ecf8427e", + "bundled.links.tsv:md5,d41d8cd98f00b204e9800998ecf8427e", + "circos.conf:md5,d41d8cd98f00b204e9800998ecf8427e", + "karyotype.tsv:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.plotsr.csv:md5,d2e982f8f4fb8e32dc2f0ae74e688aef", + "JAD.plotsr.csv:md5,1c9f46f53f33361b80e126cc5d410c7d", + "TT_2021a.plotsr.csv:md5,dba0dd1f7c25e345052363030deace6b", + "plotsr.png:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.aposteriori.svg:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.aposteriori.tsv:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.apriori.svg:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.apriori.tsv:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.tidk.explore.tsv:md5,d41d8cd98f00b204e9800998ecf8427e", + "FI1.top.sequence.txt:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-10T14:56:44.730644" + } +} diff --git a/tests/stub/params.json b/tests/stub/params.json deleted file mode 100644 index 6e303ac4..00000000 --- a/tests/stub/params.json +++ /dev/null @@ -1,29 +0,0 @@ -{ - "config_profile_name": "Full stub test", - "config_profile_description": "Full test of the pipeline in stub mode", - "input": "assets/assemblysheetv2.csv", - "ncbi_fcs_adaptor_skip": false, - "ncbi_fcs_adaptor_empire": "euk", - "ncbi_fcs_gx_skip": false, - "ncbi_fcs_gx_tax_id": 12, - "ncbi_fcs_gx_db_path": "tests/stub/gxdb/test", - "busco_skip": false, - "busco_mode": "genome", - "busco_lineage_datasets": "fungi_odb10 hypocreales_odb10", - "tidk_skip": false, - "tidk_repeat_seq": "TTTGGG", - "lai_skip": false, - "kraken2_skip": false, - "kraken2_db_path": "tests/stub/kraken2/k2_minusb_20231009.tar.gz", - "hic": "tests/stub/hic/Dummy_hic.R{1,2}.fq.gz", - "hic_skip_fastp": true, - "hic_skip_fastqc": false, - "synteny_skip": false, - "synteny_mummer_skip": false, - "synteny_plotsr_skip": false, - "synteny_xref_assemblies": "assets/xrefsheet.csv", - "merqury_skip": false, - "max_cpus": 2, - "max_memory": "6.GB", - "max_time": "6.h" -} diff --git a/tests/tiny/assemblysheet.csv b/tests/tiny/assemblysheet.csv new file mode 100644 index 00000000..196f70e4 --- /dev/null +++ b/tests/tiny/assemblysheet.csv @@ -0,0 +1,2 @@ +tag,fasta,gff3 +sarscov2,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/genome/genome.fasta,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/genome/genome.gff3 diff --git a/tests/tiny/main.nf.test b/tests/tiny/main.nf.test new file mode 100644 index 00000000..c3254e9f --- /dev/null +++ b/tests/tiny/main.nf.test @@ -0,0 +1,32 @@ +nextflow_pipeline { + + name "Test with a tiny input genome" + script "main.nf" + + test("tiny") { + + when { + params { + input = "$baseDir/tests/tiny/assemblysheet.csv" + outdir = "$outputDir" + } + } + + then { + def stable_path = getAllFilesFromDir(params.outdir, false, ['pipeline_info/*.{html,json,txt,yml}', 'report.{html,json}'], null, ['**']) + + assertAll( + { assert workflow.success}, + { assert snapshot( + [ + 'successful tasks': workflow.trace.succeeded().size(), + 'versions': removeNextflowVersion("$outputDir/pipeline_info/software_versions.yml"), + 'stable paths': stable_path + ] + ).match() } + ) + } + + } + +} diff --git a/tests/tiny/main.nf.test.snap b/tests/tiny/main.nf.test.snap new file mode 100644 index 00000000..b937e1d4 --- /dev/null +++ b/tests/tiny/main.nf.test.snap @@ -0,0 +1,47 @@ +{ + "tiny": { + "content": [ + { + "successful tasks": 9, + "versions": { + "ASSEMBLATHON_STATS": { + "assemblathon_stats": "github/PlantandFoodResearch/assemblathon2-analysis/a93cba2" + }, + "FASTAVALIDATOR": { + "py_fasta_validator": 0.6 + }, + "GT_GFF3": { + "genometools": "1.6.5" + }, + "GT_GFF3VALIDATOR": { + "genometools": "1.6.5" + }, + "GT_STAT": { + "genometools": "1.6.5" + }, + "SAMTOOLS_FAIDX": { + "samtools": 1.21 + }, + "SEQKIT_RMDUP": { + "seqkit": "v2.8.0" + }, + "TAG_ASSEMBLY": { + "pigz": "2.3.4" + }, + "Workflow": { + "plant-food-research-open/assemblyqc": "v2.2.0" + } + }, + "stable paths": [ + "sarscov2_stats.csv:md5,9ec4147b73cae550a38337520fc028dd", + "sarscov2.gt.stat.yml:md5,2504b472be03bdebaa033b0e507e4bfe" + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-10T14:46:59.600343" + } +} diff --git a/workflows/assemblyqc.nf b/workflows/assemblyqc.nf index 8f615b3e..21496aaa 100644 --- a/workflows/assemblyqc.nf +++ b/workflows/assemblyqc.nf @@ -15,6 +15,7 @@ include { GFF3_GT_GFF3_GFF3VALIDATOR_STAT } from '../subworkflows/gallvp/gff3_ include { FCS_FCSADAPTOR } from '../modules/nf-core/fcs/fcsadaptor/main' include { NCBI_FCS_GX } from '../subworkflows/local/ncbi_fcs_gx' include { ASSEMBLATHON_STATS } from '../modules/local/assemblathon_stats' +include { GFASTATS } from '../modules/nf-core/gfastats/main' include { FASTA_GXF_BUSCO_PLOT } from '../subworkflows/gallvp/fasta_gxf_busco_plot/main' include { FASTA_LTRRETRIEVER_LAI } from '../subworkflows/gallvp/fasta_ltrretriever_lai/main' include { FASTA_KRAKEN2 } from '../subworkflows/local/fasta_kraken2' @@ -29,6 +30,8 @@ include { MERYL_COUNT as PAT_MERYL_COUNT } from '../modules/nf-core/meryl/cou include { MERYL_UNIONSUM as PAT_UNIONSUM } from '../modules/nf-core/meryl/unionsum/main' include { MERQURY_HAPMERS } from '../modules/nf-core/merqury/hapmers/main' include { MERQURY_MERQURY } from '../modules/nf-core/merqury/merqury/main' +include { GFFREAD } from '../modules/nf-core/gffread/main' +include { ORTHOFINDER } from '../modules/nf-core/orthofinder/main' include { CREATEREPORT } from '../modules/local/createreport' include { FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS as FETCHNGS } from '../subworkflows/nf-core/fastq_download_prefetch_fasterqdump_sratools/main' @@ -433,6 +436,27 @@ workflow ASSEMBLYQC { ch_assemblathon_stats = ASSEMBLATHON_STATS.out.stats ch_versions = ch_versions.mix(ASSEMBLATHON_STATS.out.versions.first()) + // MODULE: GFASTATS + ch_gfastats_assembly = params.gfastats_skip + ? Channel.empty() + : ch_clean_assembly + | map { tag, fasta -> [ [ id: tag ], fasta ] } + + GFASTATS( + ch_gfastats_assembly, + 'gfa', // output format + '', // estimated genome size + '', // target specific sequence by header + [], // agp file + [], // include bed + [], // exclude bed + [] // instructions + ) + + ch_gfastats_stats = GFASTATS.out.assembly_summary + | map { tag, stats -> stats } + ch_versions = ch_versions.mix(GFASTATS.out.versions.first()) + // SUBWORKFLOW: FASTA_GXF_BUSCO_PLOT ch_busco_input_assembly = params.busco_skip ? Channel.empty() @@ -568,7 +592,20 @@ workflow ASSEMBLYQC { params.hic_skip_fastqc ) + ch_hic_fastp_log = FQ2HIC.out.fastp_log + ch_hicqc_pdf = FQ2HIC.out.hicqc_pdf ch_hic_html = FQ2HIC.out.html + ch_hic_assembly = FQ2HIC.out.assembly + ch_hic_report_files = ch_hic_html + | mix( + ch_hic_assembly.map { tag, assembly -> assembly } + ) + | mix( + ch_hicqc_pdf.map { meta, pdf -> pdf } + ) + | mix( + ch_hic_fastp_log.map { meta, log -> log } + ) ch_versions = ch_versions.mix(FQ2HIC.out.versions) // SUBWORKFLOW: FASTA_SYNTENY @@ -770,6 +807,38 @@ workflow ASSEMBLYQC { | flatMap { meta, data -> data } ch_versions = ch_versions.mix(MERQURY_MERQURY.out.versions.first()) + // MODULE: GFFREAD + ch_gffread_inputs = params.orthofinder_skip + ? Channel.empty() + : ch_valid_gff3 + | join( + ch_clean_assembly + | map { tag, fasta -> [ [ id: tag ], fasta ] } + ) + | map { [ it ] } + | collect + | filter { it.size() > 1 } + | flatten + | buffer ( size: 3 ) + + GFFREAD( + ch_gffread_inputs.map { meta, gff, fasta -> [ meta, gff ] }, + ch_gffread_inputs.map { meta, gff, fasta -> fasta } + ) + + ch_proteins_fasta = GFFREAD.out.gffread_fasta + ch_versions = ch_versions.mix(GFFREAD.out.versions.first()) + + // ORTHOFINDER + ORTHOFINDER( + ch_proteins_fasta.map { meta, fasta -> fasta }.collect().map { fastas -> [ [ id: 'assemblyqc' ], fastas ] }, + [ [], [] ] + ) + + ch_orthofinder_outputs = ORTHOFINDER.out.orthofinder + | map { meta, dir -> dir } + ch_versions = ch_versions.mix(ORTHOFINDER.out.versions) + // Collate and save software versions ch_versions = ch_versions | unique @@ -793,15 +862,17 @@ workflow ASSEMBLYQC { ch_fcs_adaptor_report .map { meta, file -> file }.collect().ifEmpty([]), ch_fcs_gx_report .mix(ch_fcs_gx_taxonomy_plot).map { meta, file -> file }.collect().ifEmpty([]), ch_assemblathon_stats .collect().ifEmpty([]), + ch_gfastats_stats .collect().ifEmpty([]), ch_gt_stats .collect().ifEmpty([]), ch_busco_outputs .collect().ifEmpty([]), ch_busco_gff_outputs .collect().ifEmpty([]), ch_tidk_outputs .collect().ifEmpty([]), ch_lai_outputs .collect().ifEmpty([]), ch_kraken2_plot .collect().ifEmpty([]), - ch_hic_html .collect().ifEmpty([]), + ch_hic_report_files .collect().ifEmpty([]), ch_synteny_outputs .collect().ifEmpty([]), ch_merqury_outputs .collect().ifEmpty([]), + ch_orthofinder_outputs .collect().ifEmpty([]), ch_versions_yml, ch_params_as_json, ch_summary_params_as_json