07_combine.qmd

---
title: "Combine outputs"
engine: knitr
---

## Overview 

Now that we know how to run many jobs, the next question is 
how to combine the output of all these jobs to analyze it. 


## Example

We will run [Pigeons](https://github.com/Julia-Tempering/Pigeons.jl) 
on the cross product formed by calling `crossProduct(variables)` with:

```groovy
def variables = [
    seed: 1..10,
    n_chains: [10, 20], 
]
```

Suppose we want to create a plot from the output of these 
20 Julia processes. 


## Strategy 

Each Julia process will create a folder. Using a function, 
we will provide an automatic name to this folder encoding the 
inputs used (`seed` and `n_chains`). That name is provided 
by `nf-nest`'s `filed()` function. 
In that folder, we will  
put csv files.

Then, once all Julia processes are done, another utilities 
from `nf-nest`, `combine_csvs`, will merge all CSVs while 
adding columns for the inputs (here, `seed` and `n_chains`).

Finally, we will pass the merged CSVs to a plotting process.


## Nextflow script

```{groovy}
#| eval: false
#| file: experiment_repo/nf-nest/examples/full.nf
```


## Running the nextflow script

```{bash}
cd experiment_repo
./nextflow run nf-nest/examples/full.nf -profile cluster 
```


## Accessing the output

Each nextflow process is associated with a unique work directory to 
ensure the processes do not interfere with each other. Here we cover two 
ways to quickly access these work directories. 

### Quick inspection

A quick way to find the  output 
of a nextflow process that we just ran is to use:

```bash
cd experiment_repo 
nf-nest/nf-open
```

This lists the work folders for the last nextflow job. 


### Organizing the output with a publishDir

A better approach is to use the `publishDir` directive, 
combined with `nf-nest`'s `deliverables()` utility, as 
illustrated in the `run_julia` process above. 
This will automatically copy the output of the process 
associated with the directive in a sub-directory of 
`experiment_repo/deliverables`. 

```{bash}
cd experiment_repo
tree deliverables
```

Here the contents of `runName.txt` can be used with nextflow's 
[`log` command](https://www.nextflow.io/docs/latest/reports.html) 
to obtain more information on the run.

```{bash echo = -1}
cd experiment_repo
cat deliverables/scriptName=full.nf/runName.txt 
```

```{bash echo = -1}
cd experiment_repo
./nextflow log
```

And we can see in the CSV that indeed the columns `seed` and `n_chains` 
were added to the left:

```{bash echo = -1}
cd experiment_repo
head -n 2 deliverables/scriptName=full.nf/output/summary.csv 
```