Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an outdir parameter for 'hello genomics' onwards #433

Merged
merged 6 commits into from
Nov 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions docs/hello_nextflow/04_hello_genomics.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ process SAMTOOLS_INDEX {

container 'community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464'

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'

input:
path input_bam
Expand All @@ -196,18 +196,18 @@ process SAMTOOLS_INDEX {
}
```

You should recognize all the pieces from what you learned in Part 1 & Part 2 of this training series; the only notable change is that this time we're using `mode: symlink` for the `publishDir` directive.
You should recognize all the pieces from what you learned in Part 1 & Part 2 of this training series; the only notable change is that this time we're using `mode: symlink` for the `publishDir` directive, and we're using a parameter to define the `publishDir`.

!!! note

Even though the data files we're using here are very small, in genomics they can get very large, so we should get into the habit of using symbolic links rather than making actual copies of these files, unless there's a compelling reason to do so.

This process is going to require us to pass in a file path via the `input_bam` input, so let's set that up next.

### 1.2. Add an input parameter declaration
### 1.2. Add an input and output parameter declaration

At the top of the file, under the `Pipeline parameters` section, we declare a CLI parameter called `reads_bam` and give it a default value.
That way, we can be lazy and not specify the input when we type the command to launch the pipeline (for development purposes).
That way, we can be lazy and not specify the input when we type the command to launch the pipeline (for development purposes). We're also going to set `params.outdir` with a default value for the output directory.

```groovy title="hello-genomics.nf" linenums="3"
/*
Expand All @@ -216,6 +216,7 @@ That way, we can be lazy and not specify the input when we type the command to l

// Primary input
params.reads_bam = "${projectDir}/data/bam/reads_mother.bam"
params.outdir = "results_genomics"
```

Now we have a process ready, as well as a parameter to give it an input to run on, so let's wire those things up together.
Expand Down Expand Up @@ -299,7 +300,7 @@ process GATK_HAPLOTYPECALLER {

container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'

input:
path input_bam
Expand Down
4 changes: 2 additions & 2 deletions docs/hello_nextflow/05_hello_operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Specifically, we show you how to implement joint variant calling with GATK, buil

The GATK variant calling method we used in Part 3 simply generated variant calls per sample.
That's fine if you only want to look at the variants from each sample in isolation, but that yields limited information.
It's often more interesting to look at variant calls differ across multiple samples, and to do so, GATK offers an alternative method called joint variant calling, which we demonstrate here.
It's often more interesting to look at how variant calls differ across multiple samples, and to do so, GATK offers an alternative method called joint variant calling, which we demonstrate here.

Joint variant calling involves generating a special kind of variant output called GVCF (for Genomic VCF) for each sample, then combining the GVCF data from all the samples and finally, running a 'joint genotyping' statistical analysis.

Expand Down Expand Up @@ -411,7 +411,7 @@ Let's write a new process to define how that's going to work, based on the comma
process GATK_GENOMICSDB {

container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
publishDir 'results_genomics', mode: 'copy'
publishDir params.outdir, mode: 'copy'

input:
path all_gvcfs
Expand Down
28 changes: 22 additions & 6 deletions docs/hello_nextflow/06_hello_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@ process SAMTOOLS_INDEX {

container 'community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464'

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'
```

_After:_
Expand All @@ -244,7 +244,7 @@ process SAMTOOLS_INDEX {
container "community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464"
conda "bioconda::samtools=1.20"

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'
```

#### 1.4.2. Update GATK_HAPLOTYPECALLER
Expand All @@ -258,7 +258,7 @@ process GATK_HAPLOTYPECALLER {

container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'
```

_After:_
Expand All @@ -269,7 +269,7 @@ process GATK_HAPLOTYPECALLER {
container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
conda "bioconda::gatk4=4.5.0.0"

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'
```

#### 1.4.3. Update GATK_JOINTGENOTYPING
Expand All @@ -283,7 +283,7 @@ process GATK_JOINTGENOTYPING {

container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'
```

_After:_
Expand All @@ -294,7 +294,7 @@ process GATK_JOINTGENOTYPING {
container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
conda "bioconda::gatk4=4.5.0.0"

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'
```

Once all three processes are updated, we can try running the workflow again.
Expand Down Expand Up @@ -872,6 +872,9 @@ So let's cut this set of params out of `main.nf`:
// Primary input (file of input files, one per line)
params.reads_bam = "${projectDir}/data/sample_bams.txt"

// Output directory
params.outdir = 'results_genomics'

// Accessory files
params.reference = "${projectDir}/data/ref/ref.fasta"
params.reference_index = "${projectDir}/data/ref/ref.fasta.fai"
Expand Down Expand Up @@ -924,6 +927,9 @@ params {
// Primary input (file of input files, one per line)
reads_bam = "${projectDir}/data/sample_bams.txt"

// Output directory
params.outdir = 'results_genomics'

// Accessory files
reference = "${projectDir}/data/ref/ref.fasta"
reference_index = "${projectDir}/data/ref/ref.fasta.fai"
Expand Down Expand Up @@ -952,6 +958,7 @@ The values are the same input files and reference files we've been using so far.
```json title="demo-params.json" linenums="1"
{
"reads_bam": "data/sample_bams.txt",
"outdir": "results_genomics",
"reference": "data/ref/ref.fasta",
"reference_index": "data/ref/ref.fasta.fai",
"reference_dict": "data/ref/ref.dict",
Expand Down Expand Up @@ -992,6 +999,9 @@ params {
// Primary input (file of input files, one per line)
reads_bam = "${projectDir}/data/sample_bams.txt"

// Output directory
outdir = 'results_genomics'

// Accessory files
reference = "${projectDir}/data/ref/ref.fasta"
reference_index = "${projectDir}/data/ref/ref.fasta.fai"
Expand All @@ -1010,6 +1020,9 @@ params {
// Primary input (file of input files, one per line)
reads_bam = null

// Output directory
outdir = null

// Accessory files
reference = null
reference_index = null
Expand Down Expand Up @@ -1085,6 +1098,9 @@ profiles {
// Primary input (file of input files, one per line)
params.reads_bam = "data/sample_bams.txt"

// Output directory
params.outdir = 'results_genomics'

// Accessory files
params.reference = "data/ref/ref.fasta"
params.reference_index = "data/ref/ref.fasta.fai"
Expand Down
6 changes: 3 additions & 3 deletions docs/hello_nextflow/07_hello_modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ process SAMTOOLS_INDEX {
container 'community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464'
conda "bioconda::samtools=1.20"

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'

input:
path input_bam
Expand Down Expand Up @@ -283,7 +283,7 @@ process GATK_HAPLOTYPECALLER {
container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
conda "bioconda::gatk4=4.5.0.0"

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'

input:
tuple path(input_bam), path(input_bam_index)
Expand Down Expand Up @@ -321,7 +321,7 @@ process GATK_JOINTGENOTYPING {
container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
conda "bioconda::gatk4=4.5.0.0"

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'

input:
path all_gvcfs
Expand Down
1 change: 1 addition & 0 deletions hello-nextflow/hello-config/demo-params.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
{
"reads_bam": "data/sample_bams.txt",
"outdir": "results_genomics",
"reference": "data/ref/ref.fasta",
"reference_index": "data/ref/ref.fasta.fai",
"reference_dict": "data/ref/ref.dict",
Expand Down
9 changes: 6 additions & 3 deletions hello-nextflow/hello-config/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
// Primary input (file of input files, one per line)
params.reads_bam = "${projectDir}/data/sample_bams.txt"

// Output directory
params.outdir = "results_genomics"

// Accessory files
params.reference = "${projectDir}/data/ref/ref.fasta"
params.reference_index = "${projectDir}/data/ref/ref.fasta.fai"
Expand All @@ -23,7 +26,7 @@ process SAMTOOLS_INDEX {

container 'community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464'

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'

input:
path input_bam
Expand All @@ -44,7 +47,7 @@ process GATK_HAPLOTYPECALLER {

container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'

input:
tuple path(input_bam), path(input_bam_index)
Expand Down Expand Up @@ -75,7 +78,7 @@ process GATK_JOINTGENOTYPING {

container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'

input:
path all_gvcfs
Expand Down
1 change: 1 addition & 0 deletions hello-nextflow/hello-modules/demo-params.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
{
"reads_bam": "data/sample_bams.txt",
"outdir": "results_genomics",
"reference": "data/ref/ref.fasta",
"reference_index": "data/ref/ref.fasta.fai",
"reference_dict": "data/ref/ref.dict",
Expand Down
6 changes: 3 additions & 3 deletions hello-nextflow/hello-modules/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ process SAMTOOLS_INDEX {
container 'community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464'
conda "bioconda::samtools=1.20"

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'

input:
path input_bam
Expand All @@ -30,7 +30,7 @@ process GATK_HAPLOTYPECALLER {
container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
conda "bioconda::gatk4=4.5.0.0"

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'

input:
tuple path(input_bam), path(input_bam_index)
Expand Down Expand Up @@ -62,7 +62,7 @@ process GATK_JOINTGENOTYPING {
container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
conda "bioconda::gatk4=4.5.0.0"

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'

input:
path all_gvcfs
Expand Down
3 changes: 3 additions & 0 deletions hello-nextflow/hello-modules/nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ profiles {
// Primary input (file of input files, one per line)
params.reads_bam = "data/sample_bams.txt"

// Output directory
params.outdir = "results_genomics"

// Accessory files
params.reference = "data/ref/ref.fasta"
params.reference_index = "data/ref/ref.fasta.fai"
Expand Down
1 change: 1 addition & 0 deletions hello-nextflow/hello-nf-test/demo-params.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
{
"reads_bam": "data/sample_bams.txt",
"outdir": "results_genomics",
"reference": "data/ref/ref.fasta",
"reference_index": "data/ref/ref.fasta.fai",
"reference_dict": "data/ref/ref.dict",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ process GATK_HAPLOTYPECALLER {
container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
conda "bioconda::gatk4=4.5.0.0"

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'

input:
tuple path(input_bam), path(input_bam_index)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ process GATK_JOINTGENOTYPING {
container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
conda "bioconda::gatk4=4.5.0.0"

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'

input:
path all_gvcfs
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ process SAMTOOLS_INDEX {
container 'community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464'
conda "bioconda::samtools=1.20"

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'

input:
path input_bam
Expand Down
6 changes: 6 additions & 0 deletions hello-nextflow/hello-nf-test/nextflow.config
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
docker.fixOwnership = true

// Default output directory
params.outdir = 'results_genomics'

profiles {
docker_on {
docker.enabled = true
Expand All @@ -24,6 +27,9 @@ profiles {
// Primary input (file of input files, one per line)
params.reads_bam = "data/sample_bams.txt"

// Output directory
params.outdir = "results_genomics"

// Accessory files
params.reference = "data/ref/ref.fasta"
params.reference_index = "data/ref/ref.fasta.fai"
Expand Down
7 changes: 5 additions & 2 deletions hello-nextflow/hello-operators.nf
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
// Primary input (file of input files, one per line)
params.reads_bam = "${projectDir}/data/sample_bams.txt"

// Output directory
params.outdir = "results_genomics"

// Accessory files
params.reference = "${projectDir}/data/ref/ref.fasta"
params.reference_index = "${projectDir}/data/ref/ref.fasta.fai"
Expand All @@ -20,7 +23,7 @@ process SAMTOOLS_INDEX {

container 'community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464'

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'

input:
path input_bam
Expand All @@ -41,7 +44,7 @@ process GATK_HAPLOTYPECALLER {

container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"

publishDir 'results_genomics', mode: 'symlink'
publishDir params.outdir, mode: 'symlink'

input:
tuple path(input_bam), path(input_bam_index)
Expand Down
Loading
Loading