Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline - draft1 #1

Draft
wants to merge 53 commits into
base: dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
588f034
initial commit
Nov 14, 2024
32575a8
Merge pull request #2 from prototaxites/main
prototaxites Nov 14, 2024
6424840
Updates:
prototaxites Nov 14, 2024
5802c55
Major updates:
prototaxites Nov 15, 2024
d22ba41
Bin3C only run when not using conda profile
prototaxites Nov 15, 2024
5a46e82
Fix metator module
prototaxites Nov 21, 2024
36e7c52
Adds bin refinement with DASTool
prototaxites Nov 21, 2024
273fc26
Start adding MAGSCOT
prototaxites Nov 21, 2024
188136a
Adds:
prototaxites Nov 22, 2024
c71d485
Merge pull request #2 from sanger-tol/dev
prototaxites Nov 27, 2024
54788be
Merge branch 'dev' into d1
prototaxites Nov 27, 2024
a5eff97
Ignore hmm assets with precommit
Nov 27, 2024
3da20e6
Fix schema
Nov 27, 2024
c207428
Fix linting?
Nov 27, 2024
83f586a
Simplify HMMER inputs
Nov 28, 2024
edf5c74
Multi-threaded pyrodigal, update bwamem2_mem
Nov 28, 2024
a06c90a
Remove schema_input.json
Nov 28, 2024
5abd8cd
Add BinQC subworkflow
Nov 28, 2024
df6fcd5
Fix linting
Nov 28, 2024
422b125
Emit versions from MAGSCOT
Nov 28, 2024
ef3700c
Update test config
Nov 28, 2024
2683775
Fix missing operator
Nov 28, 2024
0e250d5
Add full test, enable binqc
Nov 28, 2024
e77aaf7
Fix module
Nov 28, 2024
4d04658
Begin working on Taxonomy WF
Nov 29, 2024
7572fe3
Initial pass at taxonomy with CheckM quality filtering
Nov 29, 2024
9a3890f
Rename DASTool bins so to avoid downstream name conflicts
Nov 29, 2024
f13797b
Fix combine bins for postbinning, DASTool pyrodigal input
Nov 29, 2024
f88023a
Fix param typo
Nov 29, 2024
c2af30e
Fix taxonomy WF, add resource requirements for GTDB
Nov 29, 2024
4f9ae2d
GTDB takes filtered bin channel now
Nov 29, 2024
ebec246
Refine bin processin & dastool bin renaming; adds gtdb_to_ncbi script…
Dec 6, 2024
59ede17
Fix linting, fix dastool input bug
Dec 6, 2024
629cdd2
Fix typo
Dec 6, 2024
8cef726
Fix contig2bin processing
Dec 6, 2024
05c8326
Changes:
Dec 11, 2024
b6f807a
Fix linting, update dastool + metabat2 modules, fix renamed modules
Dec 12, 2024
674b4ad
add skeleton summary process
Dec 12, 2024
4cac07b
Adds:
Dec 13, 2024
9950440
Update metamdbg version; use gzipped bin files
Dec 18, 2024
62bf7a4
Add Prokka to identify ncRNAs for QC purposes; add prokka summary to …
Dec 19, 2024
9693dd7
Add enable_prokka to schema and patch changes to Prokka module (will …
Dec 19, 2024
f6db65a
Changes:
Dec 20, 2024
3970d41
Fix Prokka container and update MetaBat2 module
Dec 20, 2024
9d3f933
Add basic description to README, fix remote YAML in test
Dec 20, 2024
c94be17
Add tRNAscan-SE and Infernal to replace Prokka for ncRNA-checking
Jan 6, 2025
cb720e9
Fix bugs
Jan 7, 2025
59edad5
Merge branch 'rna' into d1
Jan 7, 2025
eeba4a2
Update README.md
prototaxites Jan 9, 2025
263769c
Adds:
Jan 10, 2025
bab4d0e
Fix linting
Jan 10, 2025
747326a
Update schema
Jan 10, 2025
4c6af95
Fix schema
Jan 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,7 @@ testing/
testing*
*.pyc
null/
co2footprint*
.nf-test/
.nf-test.log
.vscode
52 changes: 8 additions & 44 deletions .nf-core.yml
Original file line number Diff line number Diff line change
@@ -1,67 +1,29 @@
bump_version: null
lint:
files_exist:
- .github/ISSUE_TEMPLATE/bug_report.yml
- .github/ISSUE_TEMPLATE/feature_request.yml
- .github/PULL_REQUEST_TEMPLATE.md
- .github/CONTRIBUTING.md
- .github/.dockstore.yml
- conf/igenomes.config
- .github/ISSUE_TEMPLATE/config.yml
- conf/igenomes_ignored.config
- conf/igenomes.config
- assets/email_template.html
- assets/sendmail_template.txt
- assets/email_template.txt
- .github/ISSUE_TEMPLATE/bug_report.yml
- .github/ISSUE_TEMPLATE/feature_request.yml
- .github/PULL_REQUEST_TEMPLATE.md
- .github/CONTRIBUTING.md
- .github/.dockstore.yml
- conf/igenomes.config
- conf/igenomes_ignored.config
- assets/email_template.html
- assets/sendmail_template.txt
- assets/email_template.txt
- CODE_OF_CONDUCT.md
- assets/nf-core-longreadmag_logo_light.png
- docs/images/nf-core-longreadmag_logo_light.png
- docs/images/nf-core-longreadmag_logo_dark.png
- .github/ISSUE_TEMPLATE/config.yml
- .github/workflows/awstest.yml
- .github/workflows/awsfulltest.yml
files_unchanged:
- .github/ISSUE_TEMPLATE/bug_report.yml
- .github/ISSUE_TEMPLATE/config.yml
- .github/ISSUE_TEMPLATE/feature_request.yml
- .github/PULL_REQUEST_TEMPLATE.md
- .github/workflows/branch.yml
- .github/workflows/linting_comment.yml
- .github/workflows/linting.yml
- .github/CONTRIBUTING.md
- .github/.dockstore.yml
- .github/CONTRIBUTING.md
- .prettierignore
- .prettierignore
- .prettierignore
- .github/ISSUE_TEMPLATE/bug_report.yml
- .github/ISSUE_TEMPLATE/config.yml
- .github/ISSUE_TEMPLATE/feature_request.yml
- .github/PULL_REQUEST_TEMPLATE.md
- .github/workflows/branch.yml
- .github/workflows/linting_comment.yml
- .github/workflows/linting.yml
- .github/CONTRIBUTING.md
- .github/.dockstore.yml
- .github/CONTRIBUTING.md
- .prettierignore
- .prettierignore
- .prettierignore
- CODE_OF_CONDUCT.md
- assets/nf-core-longreadmag_logo_light.png
- docs/images/nf-core-longreadmag_logo_light.png
- docs/images/nf-core-longreadmag_logo_dark.png
- .github/ISSUE_TEMPLATE/bug_report.yml
included_configs: false
- .github/workflows/branch.yml
- .github/workflows/linting_comment.yml
- .github/workflows/linting.yml
multiqc_config:
- report_comment
nextflow_config:
Expand All @@ -77,10 +39,12 @@ lint:
- validation.help.afterText
- validation.summary.beforeText
- validation.summary.afterText
- config_defaults:
- params.hmm_gtdb_pfam
- params.hmm_gtdb_tigrfam
- params.checkm2_db_version
readme:
- nextflow_badge
- nextflow_badge
- nextflow_badge
nf_core_version: 3.0.2
org_path: null
repository_type: pipeline
Expand Down
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ repos:
hooks:
- id: editorconfig-checker
alias: ec
exclude: .*(\.hmm$|.*\.cm$)
3 changes: 2 additions & 1 deletion .prettierignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@

adaptivecard.json
slackreport.json
.nextflow*
work/
data/
Expand Down
60 changes: 31 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,49 +2,53 @@

## Introduction

**sanger-tol/longreadmag** is a bioinformatics pipeline that ...

<!-- TODO nf-core:
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
major pipeline sections and the types of output it produces. You're giving an overview to someone new
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
-->
**sanger-tol/longreadmag** is a bioinformatics pipeline for the assembly and binning of metagenomes
using PacBio HiFi data and (optionally) Hi-C Illumina data.

<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->

2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
1. Assembles raw reads using metaMDBG ([`MultiQC`](http://multiqc.info/))
2. Maps HiFi and (optionally) Hi-C reads to the assembly
3. Bins the assembly using MetaBat2, MaxBin2, Bin3C, and Metator
4. (optionally) refine the bins using DAS_Tool or MagScoT
5. Assesses the completeness and contamination of bins using CheckM2 and assesses ncRNA content using tRNAscan-SE for tRNA and Infernal+Rfam for rRNA
6. Assigns taxonomy to medium-quality and above bins using GTDB-Tk
7. Summarises information at the bin level

## Usage

> [!NOTE]
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.

<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
Explain what rows and columns represent. For instance (please edit as appropriate):

First, prepare a samplesheet with your input data that looks as follows:

`samplesheet.csv`:

```csv
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
First, prepare a YAML with your input data that looks as follows:

`input.yaml`:

```yaml
id: SampleName
pacbio:
fasta:
- /path/to/pacbio/file1.fasta.gz
- /path/to/pacbio/file2.fasta.gz
- ...
hic:
cram:
- /path/to/hic/hic1.cram
- /path/to/hic/hic2.cram
- ...
enzymes:
- enzyme_name_1 (e.g. DpnII)
- enzyme_name_1 (e.g. HinfI)
- ...
```

Each row represents a fastq file (single-end) or a pair of fastq files (paired end).

-->

Now, you can run the pipeline using:

<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->

```bash
nextflow run sanger-tol/longreadmag \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--input input.yaml \
--outdir <OUTDIR>
```

Expand All @@ -55,9 +59,7 @@ nextflow run sanger-tol/longreadmag \

sanger-tol/longreadmag was originally written by Jim Downie, Will Eagles, Noah Gettle.

We thank the following people for their extensive assistance in the development of this pipeline:

<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
<!-- We thank the following people for their extensive assistance in the development of this pipeline: -->

## Contributions and Support

Expand Down
Loading
Loading