Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish peripheral data as well, even if we don't use it ourselves #99

Merged
merged 4 commits into from
Jul 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ General tidy up of the configuration and the pipeline

- Increased the resources for blastn
- Removed some options that were not used or not needed
- All relevant outputs are now copied to the output directory

### Parameters

Expand Down
24 changes: 24 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,14 @@ process {
ext.args = { "-ax map-ont -I" + Math.ceil(meta2.genome_size/1e9) + 'G' }
}

withName: "MINIMAP2_.*" {
publishDir = [
path: { "${params.outdir}/read_mapping/${meta.datatype}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals("versions.yml") ? null : filename }
]
}

withName: "SAMTOOLS_VIEW" {
ext.args = "--output-fmt bam --write-index"
}
Expand All @@ -60,6 +68,22 @@ process {
ext.args = "--lineage --busco"
}

withName: "PIGZ_COMPRESS" {
publishDir = [
path: { "${params.outdir}/base_content" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals("versions.yml") ? null : filename.minus("fw_out/") }
]
}

withName: "BLOBTK_DEPTH" {
publishDir = [
path: { "${params.outdir}/read_mapping/${meta.datatype}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals("versions.yml") ? null : "${meta.id}.coverage.1k.bed.gz" }
]
}

withName: "BUSCO" {
scratch = true
ext.args = { 'test' in workflow.profile.tokenize(',') ?
Expand Down
63 changes: 55 additions & 8 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [BlobDir](#blobdir) - Output files viewable on a [BlobToolKit viewer](https://github.com/blobtoolkit/blobtoolkit)
- [Static plots](#static-plots) - Static versions of the BlobToolKit plots
- [BUSCO](#busco) - BUSCO results
- [Read alignments](#read-alignments) - Aligned reads (optional)
- [Read coverage](#read-coverage) - Read coverage tracks
- [Base content](#base-content) - _k_-mer statistics (for k ≤ 4)
- [MultiQC](#multiqc) - Aggregate report describing results from the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

Expand All @@ -26,8 +29,8 @@ The files in the BlobDir dataset which is used to create the online interactive
<summary>Output files</summary>

- `blobtoolkit/`
- `<accession>/`
- `*.json.gz`: files generated from genome and alignment coverage statistics
- `<assembly-name>/`
- `*.json.gz`: files generated from genome and alignment coverage statistics.
gq1 marked this conversation as resolved.
Show resolved Hide resolved

More information about visualising the data in the [BlobToolKit repository](https://github.com/blobtoolkit/blobtoolkit/tree/main/src/viewer)

Expand All @@ -53,12 +56,56 @@ BUSCO results generated by the pipeline (all BUSCO lineages that match the claas
<details markdown="1">
<summary>Output files</summary>

- `blobtoolkit/`
- `busco/`
- `*.batch_summary.txt`: BUSCO scores as tab-separated files (1 file per lineage).
- `*.fasta.txt`: BUSCO scores as formatted text (1 file per lineage).
- `*.json`: BUSCO scores as JSON (1 file per lineage).
- `*/`: all output BUSCO files, including the coordinate and sequence files of the annotated genes.
- `busco/`
- `<lineage-name>/`
- `short_summary.json`: BUSCO scores for that lineage as a tab-separated file.
- `short_summary.tsv`: BUSCO scores for that lineage as JSON.
- `short_summary.txt`: BUSCO scores for that lineage as formatted text.
- `full_table.tsv`: Coordinates of the annotated BUSCO genes as a tab-separated file.
- `missing_busco_list.tsv`: List of the BUSCO genes that could not be found.
- `*_busco_sequences.tar.gz`: Sequences of the annotated BUSCO genes. 1 _tar_ archive for each of the three annotation levels (`single_copy`, `multi_copy`, `fragmented`), with 1 file per gene.
- `hmmer_output.tar.gz`: Archive of the HMMER alignment scores.

</details>

### Read alignments

Read alignments in BAM format -- only if the pipeline is run with `--align`.

<details markdown="1">
<summary>Output files</summary>

- `read_mapping/`
- `<datatype>/`
- `<sample>.bam`: alignments of that sample's reads in BAM format.

</details>

### Read coverage

Read coverage statistics as computed by the pipeline.
Those files are the raw data used to build the BlobDir.

<details markdown="1">
<summary>Output files</summary>

- `read_mapping/`
- `<datatype>/`
- `<sample>.coverage.1k.bed.gz`: Bedgraph file with the coverage of the alignments of that sample per 1 kbp windows.

</details>

### Base content

_k_-mer statistics.
Those files are the raw data used to build the BlobDir.

<details markdown="1">
<summary>Output files</summary>

- `base_content/`
- `<assembly-name>_*nuc_windows.tsv.gz`: Tab-separated files with the counts of every _k_-mer for k &le; 4 in 1 kbp windows. The first three columns correspond to the coordinates (sequence name, start, end), followed by each _k_-mer.
- `<assembly-name>_freq_windows.tsv.gz`: Tab-separated files with frequencies derived from the _k_-mer counts.

</details>

Expand Down
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,11 @@
"git_sha": "b7ebe95761cd389603f9cc0e0dc384c0f663815a",
"installed_by": ["modules"]
},
"pigz/compress": {
"branch": "master",
"git_sha": "0eab94fc1e48703c1b0a8704bd665f554905c39d",
"installed_by": ["modules"]
},
"samtools/fasta": {
"branch": "master",
"git_sha": "f4596fe0bdc096cf53ec4497e83defdb3a94ff62",
Expand Down
2 changes: 1 addition & 1 deletion modules/local/blobtoolkit/updatemeta.nf
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ process BLOBTOOLKIT_UPDATEMETA {
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
exit 1, "BLOBTOOLKIT_UPDATEMETA module does not support Conda. Please use Docker / Singularity / Podman instead."
}
container "docker.io/pacificbiosciences/pyyaml:5.3.1"
container "docker.io/genomehubs/blobtoolkit:4.3.9"

input:
tuple val(meta), path(input)
Expand Down
9 changes: 9 additions & 0 deletions modules/nf-core/pigz/compress/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

45 changes: 45 additions & 0 deletions modules/nf-core/pigz/compress/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

47 changes: 47 additions & 0 deletions modules/nf-core/pigz/compress/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

49 changes: 49 additions & 0 deletions modules/nf-core/pigz/compress/tests/main.nf.test

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

37 changes: 37 additions & 0 deletions modules/nf-core/pigz/compress/tests/main.nf.test.snap

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions modules/nf-core/pigz/compress/tests/tags.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions subworkflows/local/coverage_stats.nf
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ include { SAMTOOLS_VIEW } from '../../modules/nf-core/samtools/view/main'
include { SAMTOOLS_INDEX } from '../../modules/nf-core/samtools/index/main'
include { BLOBTK_DEPTH } from '../../modules/local/blobtk/depth'
include { FASTAWINDOWS } from '../../modules/nf-core/fastawindows/main'
include { PIGZ_COMPRESS } from '../../modules/nf-core/pigz/compress/main'
include { CREATE_BED } from '../../modules/local/create_bed'


Expand Down Expand Up @@ -53,6 +54,17 @@ workflow COVERAGE_STATS {
ch_versions = ch_versions.mix ( FASTAWINDOWS.out.versions.first() )


// Compress the TSV files
PIGZ_COMPRESS (
FASTAWINDOWS.out.mononuc
| mix ( FASTAWINDOWS.out.dinuc )
| mix ( FASTAWINDOWS.out.trinuc )
| mix ( FASTAWINDOWS.out.tetranuc )
| mix ( FASTAWINDOWS.out.freq )
)
ch_versions = ch_versions.mix ( PIGZ_COMPRESS.out.versions.first() )


// Create genome windows file in BED format
CREATE_BED ( FASTAWINDOWS.out.mononuc )
ch_versions = ch_versions.mix ( CREATE_BED.out.versions.first() )
Expand Down
Loading