Skip to content

Commit

Permalink
Merge pull request #12 from sanger-tol/dev
Browse files Browse the repository at this point in the history
The Big Merge
  • Loading branch information
DLBPointon authored Oct 17, 2023
2 parents bcc9b82 + 913bf8b commit 661b2ce
Show file tree
Hide file tree
Showing 39 changed files with 531 additions and 339 deletions.
14 changes: 11 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,17 @@ jobs:
with:
version: "${{ matrix.NXF_VER }}"

- name: Run pipeline with test data
# TODO nf-core: You can customise CI pipeline run tests as required
# For example: adding multiple test runs with different parameters
- name: Download test data
# Download A fungal test data set that is full enough to show some real output.
run: |
curl https://tolit.cog.sanger.ac.uk/test-data/resources/treeval/TreeValTinyData.tar.gz | tar xzf -
- name: Run MAPS_ONLY pipeline with test data
# Remember that you can parallelise this by using strategy.matrix
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results -entry MAPS_ONLY
- name: Run ALL_FILES pipeline with test data
# Remember that you can parallelise this by using strategy.matrix
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results
2 changes: 1 addition & 1 deletion .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install nf-core
pip install nf-core==2.8.0
- name: Run nf-core lint
env:
Expand Down
29 changes: 29 additions & 0 deletions .github/workflows/sanger_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: sanger-tol LSF tests

on:
workflow_dispatch:
jobs:
run-tower:
name: Run LSF tests
runs-on: ubuntu-latest
steps:
- name: Launch workflow via tower
uses: seqeralabs/action-tower-launch@v2
with:
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
revision: ${{ github.sha }}
workdir: ${{ secrets.TOWER_WORKDIR_PARENT }}/work/${{ github.repository }}/work-${{ github.sha }}
parameters: |
{
"outdir": "${{ secrets.TOWER_WORKDIR_PARENT }}/results/${{ github.repository }}/results-${{ github.sha }}",
}
profiles: test,sanger,singularity,cleanup

- uses: actions/upload-artifact@v3
with:
name: Tower debug log file
path: |
tower_action_*.log
tower_action_*.json
43 changes: 43 additions & 0 deletions .github/workflows/sanger_test_full.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: sanger-tol LSF full size tests

on:
push:
branches:
- main
- dev
workflow_dispatch:
jobs:
run-tower:
name: Run LSF full size tests
runs-on: ubuntu-latest
steps:
- name: Sets env vars for push
run: |
echo "REVISION=${{ github.sha }}" >> $GITHUB_ENV
if: github.event_name == 'push'

- name: Sets env vars for workflow_dispatch
run: |
echo "REVISION=${{ github.sha }}" >> $GITHUB_ENV
if: github.event_name == 'workflow_dispatch'

- name: Launch workflow via tower
uses: seqeralabs/action-tower-launch@v2
with:
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
revision: ${{ env.REVISION }}
workdir: ${{ secrets.TOWER_WORKDIR_PARENT }}/work/${{ github.repository }}/work-${{ env.REVISION }}
parameters: |
{
"outdir": "${{ secrets.TOWER_WORKDIR_PARENT }}/results/${{ github.repository }}/results-${{ env.REVISION }}",
}
profiles: test_full,sanger,singularity,cleanup

- uses: actions/upload-artifact@v3
with:
name: Tower debug log file
path: |
tower_action_*.log
tower_action_*.json
18 changes: 18 additions & 0 deletions .nf-core.yml
Original file line number Diff line number Diff line change
@@ -1 +1,19 @@
repository_type: pipeline
lint:
files_exist:
- assets/multiqc_config.yml
files_unchanged:
- .github/workflows/linting.yml
- LICENSE
- .github/CONTRIBUTING.md
- docs/README.md
- .github/ISSUE_TEMPLATE/bug_report.yml
- .github/PULL_REQUEST_TEMPLATE.md
- .github/workflows/branch.yml
- assets/email_template.txt
- assets/sendmail_template.txt
- lib/NfcoreTemplate.groovy
- .prettierignore
nextflow_config:
- manifest.name
- manifest.homePage
52 changes: 47 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,56 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v1.0dev - [date]
## [[1.0.0](https://github.com/sanger-tol/curationpretext/releases/tag/1.0.0)] - UNSC Infinity - [2023-10-02]

Initial release of sanger-tol/curationpretext, created with the [sager-tol](https://nf-co.re/) template.

### `Added`
### Added

### `Fixed`
- Subworkflow to generate tracks containing telomeric sites.
- Subworkflow to generate Pretext maps and images
- Subworkflow to generate repeat density tracks.
- Subworkflow to generate longread coverage tracks from pacbio data.
- Subworkflow to generate gap tracks.

### `Dependencies`
### Parameters

### `Deprecated`
| Old Version | New Versions |
| ----------- | ------------ |
| | --input |
| | --cram |
| | --pacbio |
| | --sample |
| | --teloseq |
| | -entry |

### Software Dependencies

Note, since the pipeline is using Nextflow DSL2, each process will be run with its own Biocontainer. This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.

| Module | Old Version | New Versions |
| -------------------------------------- | ----------- | -------------- |
| bamtobed_sort ( bedtools + samtools ) | - | 2.31.0 + 1.17 |
| bedtools | - | 2.31.0 |
| cram_filter_align_bwamem2_fixmate_sort | - | |
| ^ ( samtools + bwamem2 ) ^ | - | 1.16.1 + 2.2.1 |
| extract_cov_id ( coreutils ) | - | 9.1 |
| extract_repeat ( perl ) | - | 5.26.2 |
| extract_telo ( coreutils ) | - | 9.1 |
| find_telomere_regions ( gcc ) | - | 7.1.0 |
| find_telomere_windows ( java-jdk ) | - | 8.0.112 |
| gap_length ( coreutils ) | - | 9.1 |
| generate_cram_csv ( samtools ) | - | 1.17 |
| get_largest_scaff ( coreutils ) | - | 9.1 |
| gnu-sort | - | 8.25 |
| pretextmap + samtools | - | 0.1.9 + 1.17 |
| seqtk | - | 1.4 |
| tabix | - | 1.11 |
| ucsc | - | 377 |
| windowmasker (blast) | - | 2.14.0 |

### Fixed

### Dependencies

### Deprecated
61 changes: 58 additions & 3 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,65 @@
## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
- [Bedtools](https://bedtools.readthedocs.io/en/latest/)

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
> Quinlan, A.R. and Hall, I.M. 2010. ‘BEDTools: A flexible suite of utilities for comparing genomic features’, Bioinformatics, 26(6), pp. 841–842. doi:10.1093/bioinformatics/btq033.
- [bwa-mem2](https://ieeexplore.ieee.org/document/8820962)

> Vasimuddin, Md. et al. 2019. ‘Efficient architecture-aware acceleration of BWA-mem for multicore systems’, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) [Preprint]. doi:10.1109/ipdps.2019.00041.
- [coreutils](https://github.com/coreutils/coreutils)

> GNU Coreutils. 2023. coreutils [online]. https://github.com/coreutils/coreutils/releases/tag/v9.4. (Accessed on 25th September 2023).
- [Find Telomere](https://github.com/VGP/vgp-assembly/tree/master/pipeline/telomere)

> VGP. 2022. vgp-assembly telomere [online]. https://github.com/VGP/vgp-assembly/tree/master/pipeline/telomere. (Accessed on 28th February 2023).
- [Java](https://docs.oracle.com/javase/8/docs/api/overview-summary.html)

> Oracle. 2023. Java Documentation. https://docs.oracle.com/javase/8/docs/index.html. (Accessed on 25th September 2023).
- [Minimap2](https://pubmed.ncbi.nlm.nih.gov/34623391/)

> Li, H. 2021. ‘New strategies to improve MINIMAP2 alignment accuracy’, Bioinformatics, 37(23), pp. 4572–4574. doi:10.1093/bioinformatics/btab705.
- [Perl](https://perldoc.perl.org/perl)

> Perl Organisation. 2023. Perl Language Reference v5.36.0. https://perldoc.perl.org/perl. (Accessed 28th February 2023).
- [PretextMap](https://github.com/wtsi-hpag/PretextMap)

> Harry, E. 2022. PretextView [online]. https://github.com/wtsi-hpag/PretextView. (Accessed on 7th June 2023).
- [Python: 3.10](https://docs.python.org/3.10/whatsnew/3.10.html)

> Python Software Foundation. 2023. Python Language Reference v3.10. https://docs.python.org/3.10/whatsnew/3.10.html. (Accessed 28th February 2023).
- [Samtools](https://pubmed.ncbi.nlm.nih.gov/33590861/)

> Di Tommaso, Paolo, et al. 2017. “Nextflow Enables Reproducible Computational Workflows.” Nature Biotechnology, 35(4), pp. 316–19, https://doi.org/10.1038/nbt.3820.
- [SeqTK](https://github.com/lh3/seqtk)

> Li, Heng. 2023. seqtk [online]. https://github.com/lh3/seqtk. (Accessed on 7th June 2023).
- [staden_io_lib / iolib](https://github.com/jkbonfield/io_lib)

> Bonfield JK. 2023. io_lib [online]. https://github.com/jkbonfield/io_lib. (Accessed on 7th June 2023).
- [Tabix](http://www.htslib.org/doc/tabix.html)

> Li, Heng. 2023. tabix [online]. http://www.htslib.org/doc/tabix.html. (Accessed on 7th June 2023).
- [UCSC tools](https://github.com/ucscGenomeBrowser/kent/tree/master)

> UCSC Genome Browser Group. 2023. kent [online]. https://github.com/ucscGenomeBrowser/kent/tree/master. (Accessed on 7th June 2023).
- [WindowMasker](https://pubmed.ncbi.nlm.nih.gov/16287941/)

> Morgulis, A., et al. 2006. WindowMasker: window-based masker for sequenced genomes. Bioinformatics. 22(2). pp.134–141. doi: 10.1093/bioinformatics/bti774.
## Software packaging/containerisation tools

Expand Down
50 changes: 32 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,16 @@

## Introduction

**sanger-tol/curationpretext** is a bioinformatics pipeline typically used in conjunction with [TreeVal](https://github.com/sanger-tol/treeval) to generate pretext maps (and optionally telomeric, gap, coverage and repeat density plots which can be ingested into pretext) for the manual curation of high quality genomes.
**sanger-tol/curationpretext** is a bioinformatics pipeline typically used in conjunction with [TreeVal](https://github.com/sanger-tol/treeval) to generate pretext maps (and optionally telomeric, gap, coverage, and repeat density plots which can be ingested into pretext) for the manual curation of high quality genomes.

This is intended as a supplementary pipeline for the [treeval](https://github.com/sanger-tol/treeval) project. This pipeline can be simply used to generate pretext maps, information on how to run this pipeline can be found in the [usage documentation](https://pipelines.tol.sanger.ac.uk/curationpretext/usage).

<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->

1. Generate Maps - Generates pretext maps aswell as a static image.
1. Generate Maps - Generates pretext maps as well as a static image.

2. Accessory files - Generates the repeat density, gap, telomere and coverage tracks.
2. Accessory files - Generates the repeat density, gap, telomere, and coverage tracks.

## Usage

Expand All @@ -30,50 +32,62 @@
Currently, the pipeline uses the following flags:

- --fasta
- The absolute path to the assembled genome in, e.g, `/path/to/assembly.fa`
- --input

- The absolute path to the assembled genome in, e.g., `/path/to/assembly.fa`

- --pacbio

- --pacbio
- The directory of the fasta files generated from pacbio reads, e.g, `/path/to/fasta/`
- The directory of the fasta files generated from pacbio reads, e.g., `/path/to/fasta/`

- --cram
- The directory of the cram *and* cram.crai files, e.g, `/path/to/cram/`
- --cram

- The directory of the cram _and_ cram.crai files, e.g., `/path/to/cram/`

- --teloseq
- A telomeric sequence, e.g, `TTAGGG`

- A telomeric sequence, e.g., `TTAGGG`

- -entry
- ALL_FILES generates all accessory files as well as pretext maps
- MAPS generates only the pretext maps and static images
- ALL_FILES generates all accessory files as well as pretext maps
- MAPS_ONLY generates only the pretext maps and static images

Now, you can run the pipeline using:

<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->

```bash
// For ALL_FILES run
nextflow run sanger-tol/curationpretext \
-profile <docker/singularity/.../institute> \
--fasta path/to/assembly.fa \
--input path/to/assembly.fa \
--cram path/to/cram/ \
--pacbio path/to/pacbio/fasta/ \
--teloseq TTAGGG \
-entry { ALL_FILES | MAPS } \
--sample { default is "pretext_rerun" }
--outdir path/to/outdir/

// For MAPS_ONLY run
nextflow run sanger-tol/curationpretext \
-profile <docker/singularity/.../institute> \
--input path/to/assembly.fa \
--cram path/to/cram/ \
--sample { default is "pretext_rerun" }
-entry MAPS_ONLY \
--outdir path/to/outdir/
```
> **Warning:**
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those
> provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;
> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files).
For more details, please refer to the [usage documentation](https://nf-co.re/curationpretext/usage) and the [parameter documentation](https://nf-co.re/curationpretext/parameters).
For more details, please refer to the [usage documentation](https://pipelines.tol.sanger.ac.uk/curationpretext/usage) and the [parameter documentation](https://pipelines.tol.sanger.ac.uk/curationpretext/parameters).
## Pipeline output
To see the the results of a test run with a full size dataset refer to the [results](https://nf-co.re/curationpretext/results) tab on the nf-core website pipeline page.
To see the the results of a test run with a full size dataset refer to the [results](https://pipelines.tol.sanger.ac.uk/curationpretext/results) tab on the nf-core website pipeline page.
For more details about the output files and reports, please refer to the
[output documentation](https://nf-co.re/curationpretext/output).
[output documentation](https://pipelines.tol.sanger.ac.uk/curationpretext/output).
## Credits
Expand Down
13 changes: 0 additions & 13 deletions assets/multiqc_config.yml

This file was deleted.

8 changes: 4 additions & 4 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,17 @@ process {
time = { check_max( 4.h * task.attempt, 'time' ) }

errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' }
maxRetries = 1
maxRetries = 2
maxErrors = '-1'

withName: '.*:.*:LONGREAD_COVERAGE:(MINIMAP2_ALIGN|MINIMAP2_ALIGN_SPLIT)' {
cpus = { check_max( 16 * 1, 'cpus' ) }
memory = { check_max( 100.GB * task.attempt, 'memory' ) }
memory = { check_max( 25.GB * task.attempt, 'memory' ) }
}

withName: CRAM_FILTER_ALIGN_BWAMEM2_FIXMATE_SORT {
cpus = { check_max( 16 * 1, 'cpus' ) }
memory = { check_max( 50.GB * task.attempt, 'memory' ) }
memory = { check_max( 25.GB * task.attempt, 'memory' ) }
}

withName: PRETEXTMAP_STANDRD{
Expand All @@ -45,7 +45,7 @@ process {
}

withName: BWAMEM2_INDEX {
cpus = {}
memory = { check_max( 25.GB * task.attempt, 'memory' ) }
}

// Process-specific resource requirements
Expand Down
Loading

0 comments on commit 661b2ce

Please sign in to comment.