Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Big Merge #12

Merged
merged 57 commits into from
Oct 17, 2023
Merged
Show file tree
Hide file tree
Changes from 47 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
7dd14af
Merge pull request #1 from DLBPointon/master
DLBPointon Jul 11, 2023
a1362d0
Merge pull request #3 from DLBPointon/main
DLBPointon Jul 12, 2023
afee53d
Updates to configs and md
DLBPointon Jul 12, 2023
3126825
Merge pull request #4 from sanger-tol/master
DLBPointon Jul 12, 2023
41b75bf
Adding linting fixes
DLBPointon Jul 13, 2023
9e8836e
Linting fixes
DLBPointon Jul 13, 2023
94533b8
Linting fixes
DLBPointon Jul 13, 2023
46a00a8
Linting fixes
DLBPointon Jul 13, 2023
c0963d9
Linting fixes
DLBPointon Jul 13, 2023
d9cf87f
Merge pull request #5 from sanger-tol/fixes
DLBPointon Jul 13, 2023
99816ae
Updates
DLBPointon Jul 18, 2023
b7ce998
Update README.md
DLBPointon Sep 15, 2023
28abc04
Update usage.md
DLBPointon Sep 19, 2023
215c2e6
Update usage.md
DLBPointon Sep 19, 2023
fdb7c3f
Update usage.md
DLBPointon Sep 19, 2023
087f3a9
Update README.md
DLBPointon Sep 19, 2023
d7801e2
Adding testing
DLBPointon Sep 19, 2023
7131d3f
Adding testing
DLBPointon Sep 19, 2023
d802476
Adding testing
DLBPointon Sep 19, 2023
6ac3177
Adding testing
DLBPointon Sep 19, 2023
74d7bf8
Adding testing
DLBPointon Sep 19, 2023
f055b36
Adding testing
DLBPointon Sep 19, 2023
1956859
Attempting to fix testing
DLBPointon Sep 19, 2023
967d21d
Attempting to fix testing
DLBPointon Sep 19, 2023
0ff9bd6
Attempting to fix testing
DLBPointon Sep 19, 2023
e2b6777
Attempting to fix testing
DLBPointon Sep 19, 2023
7dab39b
Attempting to fix testing
DLBPointon Sep 19, 2023
e069be5
Adding files for tower use
DLBPointon Sep 19, 2023
8e3923c
Added sanger testing CI
DLBPointon Sep 19, 2023
791443b
Update for testing
DLBPointon Sep 20, 2023
6d7f397
Update for testing
DLBPointon Sep 20, 2023
6006596
Merge pull request #10 from sanger-tol/DLBPointon-patch-1
DLBPointon Sep 21, 2023
2ba5035
Update CITATIONS.md
DLBPointon Sep 21, 2023
a21d826
Update output.md
DLBPointon Sep 21, 2023
b01b5ae
More Docs
DLBPointon Sep 21, 2023
01aa6eb
Prettier docs
DLBPointon Sep 21, 2023
3ad8048
Prettier docs
DLBPointon Sep 21, 2023
ed91e24
Updating Citations for Java and coreutils
DLBPointon Sep 25, 2023
8313724
Carry over of @muffato changes to TreeVal
DLBPointon Sep 25, 2023
7ec90b1
Updates
DLBPointon Sep 25, 2023
f170ae3
Merge pull request #11 from sanger-tol/DLBPointon-patch-1
DLBPointon Sep 25, 2023
1470845
Fix for schema
DLBPointon Sep 28, 2023
2eda498
Merge branch 'dev' into DLBPointon-patch-1
DLBPointon Sep 28, 2023
64ce40a
Fixes
DLBPointon Sep 28, 2023
d848105
Merge remote-tracking branch 'refs/remotes/origin/DLBPointon-patch-1'…
DLBPointon Sep 28, 2023
e9609f9
Fixes for LSF
DLBPointon Sep 28, 2023
083e047
Merge pull request #13 from sanger-tol/DLBPointon-patch-1
DLBPointon Sep 28, 2023
209d622
Adding CHANGELOG
DLBPointon Oct 2, 2023
3fcf1bb
Adding CHANGELOG
DLBPointon Oct 2, 2023
ec7656e
Correcting memory requirements in base
DLBPointon Oct 9, 2023
07122fb
Updating docs to add information on updating test.config for local use
DLBPointon Oct 9, 2023
76d5dec
Update CHANGELOG.md
DLBPointon Oct 9, 2023
26dbbe8
adding parameters
DLBPointon Oct 9, 2023
e121ca5
Merge branch 'Changelog' of https://github.com/sanger-tol/curationpre…
DLBPointon Oct 9, 2023
53b01ef
remove quotes
DLBPointon Oct 9, 2023
8e70da9
linting
DLBPointon Oct 9, 2023
913bf8b
Merge pull request #14 from sanger-tol/Changelog
DLBPointon Oct 10, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,17 @@ jobs:
with:
version: "${{ matrix.NXF_VER }}"

- name: Run pipeline with test data
# TODO nf-core: You can customise CI pipeline run tests as required
# For example: adding multiple test runs with different parameters
- name: Download test data
# Download A fungal test data set that is full enough to show some real output.
run: |
curl https://tolit.cog.sanger.ac.uk/test-data/resources/treeval/TreeValTinyData.tar.gz | tar xzf -

- name: Run MAPS_ONLY pipeline with test data
# Remember that you can parallelise this by using strategy.matrix
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results -entry MAPS_ONLY

- name: Run ALL_FILES pipeline with test data
# Remember that you can parallelise this by using strategy.matrix
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results
2 changes: 1 addition & 1 deletion .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install nf-core
pip install nf-core==2.8.0

- name: Run nf-core lint
env:
Expand Down
29 changes: 29 additions & 0 deletions .github/workflows/sanger_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: sanger-tol LSF tests

on:
workflow_dispatch:
jobs:
run-tower:
name: Run LSF tests
runs-on: ubuntu-latest
steps:
- name: Launch workflow via tower
uses: seqeralabs/action-tower-launch@v2
with:
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
revision: ${{ github.sha }}
workdir: ${{ secrets.TOWER_WORKDIR_PARENT }}/work/${{ github.repository }}/work-${{ github.sha }}
parameters: |
{
"outdir": "${{ secrets.TOWER_WORKDIR_PARENT }}/results/${{ github.repository }}/results-${{ github.sha }}",
}
profiles: test,sanger,singularity,cleanup

- uses: actions/upload-artifact@v3
with:
name: Tower debug log file
path: |
tower_action_*.log
tower_action_*.json
43 changes: 43 additions & 0 deletions .github/workflows/sanger_test_full.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: sanger-tol LSF full size tests

on:
push:
branches:
- main
- dev
workflow_dispatch:
jobs:
run-tower:
name: Run LSF full size tests
runs-on: ubuntu-latest
steps:
- name: Sets env vars for push
run: |
echo "REVISION=${{ github.sha }}" >> $GITHUB_ENV
if: github.event_name == 'push'

- name: Sets env vars for workflow_dispatch
run: |
echo "REVISION=${{ github.sha }}" >> $GITHUB_ENV
if: github.event_name == 'workflow_dispatch'

- name: Launch workflow via tower
uses: seqeralabs/action-tower-launch@v2
with:
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
revision: ${{ env.REVISION }}
workdir: ${{ secrets.TOWER_WORKDIR_PARENT }}/work/${{ github.repository }}/work-${{ env.REVISION }}
parameters: |
{
"outdir": "${{ secrets.TOWER_WORKDIR_PARENT }}/results/${{ github.repository }}/results-${{ env.REVISION }}",
}
profiles: test_full,sanger,singularity,cleanup

- uses: actions/upload-artifact@v3
with:
name: Tower debug log file
path: |
tower_action_*.log
tower_action_*.json
18 changes: 18 additions & 0 deletions .nf-core.yml
Original file line number Diff line number Diff line change
@@ -1 +1,19 @@
repository_type: pipeline
lint:
files_exist:
- assets/multiqc_config.yml
files_unchanged:
- .github/workflows/linting.yml
- LICENSE
- .github/CONTRIBUTING.md
- docs/README.md
- .github/ISSUE_TEMPLATE/bug_report.yml
- .github/PULL_REQUEST_TEMPLATE.md
- .github/workflows/branch.yml
- assets/email_template.txt
- assets/sendmail_template.txt
- lib/NfcoreTemplate.groovy
- .prettierignore
nextflow_config:
- manifest.name
- manifest.homePage
61 changes: 58 additions & 3 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,65 @@

## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
- [Bedtools](https://bedtools.readthedocs.io/en/latest/)

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
> Quinlan, A.R. and Hall, I.M. 2010. ‘BEDTools: A flexible suite of utilities for comparing genomic features’, Bioinformatics, 26(6), pp. 841–842. doi:10.1093/bioinformatics/btq033.

- [bwa-mem2](https://ieeexplore.ieee.org/document/8820962)

> Vasimuddin, Md. et al. 2019. ‘Efficient architecture-aware acceleration of BWA-mem for multicore systems’, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) [Preprint]. doi:10.1109/ipdps.2019.00041.

- [coreutils](https://github.com/coreutils/coreutils)

> GNU Coreutils. 2023. coreutils [online]. https://github.com/coreutils/coreutils/releases/tag/v9.4. (Accessed on 25th September 2023).

- [Find Telomere](https://github.com/VGP/vgp-assembly/tree/master/pipeline/telomere)

> VGP. 2022. vgp-assembly telomere [online]. https://github.com/VGP/vgp-assembly/tree/master/pipeline/telomere. (Accessed on 28th February 2023).

- [Java](https://docs.oracle.com/javase/8/docs/api/overview-summary.html)

> Oracle. 2023. Java Documentation. https://docs.oracle.com/javase/8/docs/index.html. (Accessed on 25th September 2023).

- [Minimap2](https://pubmed.ncbi.nlm.nih.gov/34623391/)

> Li, H. 2021. ‘New strategies to improve MINIMAP2 alignment accuracy’, Bioinformatics, 37(23), pp. 4572–4574. doi:10.1093/bioinformatics/btab705.

- [Perl](https://perldoc.perl.org/perl)

> Perl Organisation. 2023. Perl Language Reference v5.36.0. https://perldoc.perl.org/perl. (Accessed 28th February 2023).

- [PretextMap](https://github.com/wtsi-hpag/PretextMap)

> Harry, E. 2022. PretextView [online]. https://github.com/wtsi-hpag/PretextView. (Accessed on 7th June 2023).

- [Python: 3.10](https://docs.python.org/3.10/whatsnew/3.10.html)

> Python Software Foundation. 2023. Python Language Reference v3.10. https://docs.python.org/3.10/whatsnew/3.10.html. (Accessed 28th February 2023).

- [Samtools](https://pubmed.ncbi.nlm.nih.gov/33590861/)

> Di Tommaso, Paolo, et al. 2017. “Nextflow Enables Reproducible Computational Workflows.” Nature Biotechnology, 35(4), pp. 316–19, https://doi.org/10.1038/nbt.3820.

- [SeqTK](https://github.com/lh3/seqtk)

> Li, Heng. 2023. seqtk [online]. https://github.com/lh3/seqtk. (Accessed on 7th June 2023).

- [staden_io_lib / iolib](https://github.com/jkbonfield/io_lib)

> Bonfield JK. 2023. io_lib [online]. https://github.com/jkbonfield/io_lib. (Accessed on 7th June 2023).

- [Tabix](http://www.htslib.org/doc/tabix.html)

> Li, Heng. 2023. tabix [online]. http://www.htslib.org/doc/tabix.html. (Accessed on 7th June 2023).

- [UCSC tools](https://github.com/ucscGenomeBrowser/kent/tree/master)

> UCSC Genome Browser Group. 2023. kent [online]. https://github.com/ucscGenomeBrowser/kent/tree/master. (Accessed on 7th June 2023).

- [WindowMasker](https://pubmed.ncbi.nlm.nih.gov/16287941/)

> Morgulis, A., et al. 2006. WindowMasker: window-based masker for sequenced genomes. Bioinformatics. 22(2). pp.134–141. doi: 10.1093/bioinformatics/bti774.

## Software packaging/containerisation tools

Expand Down
50 changes: 32 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,16 @@

## Introduction

**sanger-tol/curationpretext** is a bioinformatics pipeline typically used in conjunction with [TreeVal](https://github.com/sanger-tol/treeval) to generate pretext maps (and optionally telomeric, gap, coverage and repeat density plots which can be ingested into pretext) for the manual curation of high quality genomes.
**sanger-tol/curationpretext** is a bioinformatics pipeline typically used in conjunction with [TreeVal](https://github.com/sanger-tol/treeval) to generate pretext maps (and optionally telomeric, gap, coverage, and repeat density plots which can be ingested into pretext) for the manual curation of high quality genomes.

This is intended as a supplementary pipeline for the [treeval](https://github.com/sanger-tol/treeval) project. This pipeline can be simply used to generate pretext maps, information on how to run this pipeline can be found in the [usage documentation](https://pipelines.tol.sanger.ac.uk/curationpretext/usage).

<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->

1. Generate Maps - Generates pretext maps aswell as a static image.
1. Generate Maps - Generates pretext maps as well as a static image.

2. Accessory files - Generates the repeat density, gap, telomere and coverage tracks.
2. Accessory files - Generates the repeat density, gap, telomere, and coverage tracks.

## Usage

Expand All @@ -30,50 +32,62 @@

Currently, the pipeline uses the following flags:

- --fasta
- The absolute path to the assembled genome in, e.g, `/path/to/assembly.fa`
- --input

- The absolute path to the assembled genome in, e.g., `/path/to/assembly.fa`

- --pacbio

- --pacbio
- The directory of the fasta files generated from pacbio reads, e.g, `/path/to/fasta/`
- The directory of the fasta files generated from pacbio reads, e.g., `/path/to/fasta/`

- --cram
- The directory of the cram *and* cram.crai files, e.g, `/path/to/cram/`
- --cram

- The directory of the cram _and_ cram.crai files, e.g., `/path/to/cram/`

- --teloseq
- A telomeric sequence, e.g, `TTAGGG`

- A telomeric sequence, e.g., `TTAGGG`

- -entry
- ALL_FILES generates all accessory files as well as pretext maps
- MAPS generates only the pretext maps and static images
- ALL_FILES generates all accessory files as well as pretext maps
- MAPS_ONLY generates only the pretext maps and static images

Now, you can run the pipeline using:

<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->

```bash
// For ALL_FILES run
nextflow run sanger-tol/curationpretext \
-profile <docker/singularity/.../institute> \
--fasta path/to/assembly.fa \
--input path/to/assembly.fa \
--cram path/to/cram/ \
--pacbio path/to/pacbio/fasta/ \
--teloseq TTAGGG \
-entry { ALL_FILES | MAPS } \
--sample { default is "pretext_rerun" }
--outdir path/to/outdir/

// For MAPS_ONLY run
nextflow run sanger-tol/curationpretext \
-profile <docker/singularity/.../institute> \
--input path/to/assembly.fa \
--cram path/to/cram/ \
--sample { default is "pretext_rerun" }
-entry MAPS_ONLY \
--outdir path/to/outdir/
```

> **Warning:**
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those
> provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;
> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files).

For more details, please refer to the [usage documentation](https://nf-co.re/curationpretext/usage) and the [parameter documentation](https://nf-co.re/curationpretext/parameters).
For more details, please refer to the [usage documentation](https://pipelines.tol.sanger.ac.uk/curationpretext/usage) and the [parameter documentation](https://pipelines.tol.sanger.ac.uk/curationpretext/parameters).

## Pipeline output

To see the the results of a test run with a full size dataset refer to the [results](https://nf-co.re/curationpretext/results) tab on the nf-core website pipeline page.
To see the the results of a test run with a full size dataset refer to the [results](https://pipelines.tol.sanger.ac.uk/curationpretext/results) tab on the nf-core website pipeline page.
For more details about the output files and reports, please refer to the
[output documentation](https://nf-co.re/curationpretext/output).
[output documentation](https://pipelines.tol.sanger.ac.uk/curationpretext/output).

## Credits

Expand Down
13 changes: 0 additions & 13 deletions assets/multiqc_config.yml

This file was deleted.

6 changes: 3 additions & 3 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ process {
time = { check_max( 4.h * task.attempt, 'time' ) }

errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' }
maxRetries = 1
maxRetries = 2
maxErrors = '-1'

withName: '.*:.*:LONGREAD_COVERAGE:(MINIMAP2_ALIGN|MINIMAP2_ALIGN_SPLIT)' {
Expand All @@ -26,7 +26,7 @@ process {

withName: CRAM_FILTER_ALIGN_BWAMEM2_FIXMATE_SORT {
cpus = { check_max( 16 * 1, 'cpus' ) }
memory = { check_max( 50.GB * task.attempt, 'memory' ) }
memory = { check_max( 80.GB * task.attempt, 'memory' ) }
}

withName: PRETEXTMAP_STANDRD{
Expand All @@ -45,7 +45,7 @@ process {
}

withName: BWAMEM2_INDEX {
cpus = {}
memory = { check_max( 100.GB * task.attempt, 'memory' ) }
DLBPointon marked this conversation as resolved.
Show resolved Hide resolved
}

// Process-specific resource requirements
Expand Down
21 changes: 10 additions & 11 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,18 @@
*/

params {
config_profile_name = 'Test profile'
config_profile_name = 'GitHub Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'

// Limit resources so that this can run on GitHub Actions
max_cpus = 2
max_memory = '6.GB'
max_time = '6.h'
max_cpus = 2
max_memory = '6.GB'
max_time = '6.h'

// Input data
// TODO nf-core: Specify the paths to your test data on nf-core/test-datasets
// TODO nf-core: Give any required params for the test so that command line flags are not needed
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv'

// Genome references
genome = 'R64-1-1'
input = "/home/runner/work/curationpretext/curationpretext/TreeValTinyData/assembly/draft/grTriPseu1.fa"
outdir = "./results"
pacbio = "/home/runner/work/curationpretext/curationpretext/TreeValTinyData/genomic_data/pacbio/"
cram = "/home/runner/work/curationpretext/curationpretext/TreeValTinyData/genomic_data/hic-arima/"
sample = "CurationPretextTest"
teloseq = "TTAGGG"
}
7 changes: 5 additions & 2 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ params {
// TODO nf-core: Give any required params for the test so that command line flags are not needed
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_full_illumina_amplicon.csv'

// Genome references
genome = 'R64-1-1'
input = "/lustre/scratch123/tol/resources/treeval/treeval-testdata/TreeValSmallData/Oscheius_DF5033/assembly/draft/DF5033.hifiasm.noTelos.20211120/DF5033.noTelos.hifiasm.purged.noCont.noMito.fasta"
pacbio = "/lustre/scratch123/tol/resources/treeval/treeval-testdata/TreeValSmallData/Oscheius_DF5033/genomic_data/nxOscSpes1/pacbio/fasta/"
cram = "/lustre/scratch123/tol/resources/treeval/treeval-testdata/TreeValSmallData/Oscheius_DF5033/genomic_data/nxOscSpes1/hic-arima2/full/"
sample = "CurationPretextTest"
teloseq = "TTAGGG"
}
Loading