-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path07_combine.qmd
113 lines (79 loc) · 2.54 KB
/
07_combine.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
title: "Combine outputs"
engine: knitr
---
## Overview
Now that we know how to run many jobs, the next question is
how to combine the output of all these jobs to analyze it.
## Example
We will run [Pigeons](https://github.com/Julia-Tempering/Pigeons.jl)
on the cross product formed by calling `crossProduct(variables)` with:
```groovy
def variables = [
seed: 1..10,
n_chains: [10, 20],
]
```
Suppose we want to create a plot from the output of these
20 Julia processes.
## Strategy
Each Julia process will create a folder. Using a function,
we will provide an automatic name to this folder encoding the
inputs used (`seed` and `n_chains`). That name is provided
by `nf-nest`'s `filed()` function.
In that folder, we will
put csv files.
Then, once all Julia processes are done, another utilities
from `nf-nest`, `combine_csvs`, will merge all CSVs while
adding columns for the inputs (here, `seed` and `n_chains`).
Finally, we will pass the merged CSVs to a plotting process.
## Nextflow script
```{groovy}
#| eval: false
#| file: experiment_repo/nf-nest/examples/full.nf
```
## Running the nextflow script
```{bash}
cd experiment_repo
./nextflow run nf-nest/examples/full.nf -profile cluster
```
## Accessing the output
Each nextflow process is associated with a unique work directory to
ensure the processes do not interfere with each other. Here we cover two
ways to quickly access these work directories.
### Quick inspection
A quick way to find the output
of a nextflow process that we just ran is to use:
```bash
cd experiment_repo
nf-nest/nf-open
```
This lists the work folders for the last nextflow job.
### Organizing the output with a publishDir
A better approach is to use the `publishDir` directive,
combined with `nf-nest`'s `deliverables()` utility, as
illustrated in the `run_julia` process above.
This will automatically copy the output of the process
associated with the directive in a sub-directory of
`experiment_repo/deliverables`.
```{bash}
cd experiment_repo
tree deliverables
```
Here the contents of `runName.txt` can be used with nextflow's
[`log` command](https://www.nextflow.io/docs/latest/reports.html)
to obtain more information on the run.
```{bash echo = -1}
cd experiment_repo
cat deliverables/scriptName=full.nf/runName.txt
```
```{bash echo = -1}
cd experiment_repo
./nextflow log
```
And we can see in the CSV that indeed the columns `seed` and `n_chains`
were added to the left:
```{bash echo = -1}
cd experiment_repo
head -n 2 deliverables/scriptName=full.nf/output/summary.csv
```