more drafting

coderefinery · Mar 13, 2024 · 54ca947 · 54ca947
1 parent 164e4fb
commit 54ca947
Showing 1 changed file with 16 additions and 76 deletions.
diff --git a/content/workflow-management.md b/content/workflow-management.md
@@ -39,80 +39,16 @@ $ python statistics/count.py data/isles.txt > statistics/isles.data
 $ python plot/plot.py --data-file statistics/isles.data --plot-file plot/isles.png
 ```
 
-
-```{discussion}
-We have two steps and 4 books. But **imagine having 4 steps and processing 500 books**.
-Can you relate? Are you using similar setups in your research? How do you record them?
-```
-
-````{discussion} Kitchen analogy
-```{figure} img/kitchen/busy.png
-:alt: Busy kitchen
-:width: 50%
-
-Now we have many similar meals to prepare and possibly many chefs
-present (cores) and workflow tools can help us to plan and document the steps
-and run them efficiently. [Midjourney, CC-BY-NC 4.0]
-```
-````
-
-**We will imagine solving this in four different ways and discuss pros and cons.**
-
----
-
-## Solution 1: Graphical user interface (GUI)
-
-Imagine we have programmed a GUI with a nice interface with icons where you can select scripts and input files by clicking:
-
-- Click on counting script
-- Select book txt file
-- Give a name for the dat file
-- Click on a run symbol
-- Click on plotting script
-- Select book dat file
-- Give a name for the image file
-- Click on a run symbol
-- ...
-- Go to next book ...
-- Click on counting script
-- Select book txt file
-- ...
-
-Disclaimer: not all GUIs behave this way - there exist very good GUI solutions which enable
-reproducibility and automation.
-
----
-
-## Solution 2: Manual steps
-
-It is not too much work for four files:
-
-```{code-block} console
----
-emphasize-lines: 1-2, 13
----
-
-$ python statistics/count.py data/abyss.txt > statistics/abyss.data
-$ python plot/plot.py --data-file statistics/abyss.data --plot-file plot/abyss.png
-
-$ python statistics/count.py data/isles.txt > statistics/isles.data
-$ python plot/plot.py --data-file statistics/isles.data --plot-file plot/isles.png
-
-$ python statistics/count.py data/last.txt > statistics/last.data
-$ python plot/plot.py --data-file statistics/last.data --plot-file plot/last.png
-
-$ python statistics/count.py data/sierra.txt > statistics/sierra.data
-$ python plot/plot.py --data-file statistics/sierra.data --plot-file plot/sierra.png
-
-```
+This could also be implemented with a graphical user interface (GUI), where you can for example drag and drop files and click buttons to do the different processing steps.
 
 This is **imperative style**: first do this, then to that, then do that, finally do ...
 
----
 
-## Solution 3: Script
+````{discussion}
+Both of the above are tricky in terms of reproducibility. We currently have two steps and 4 books. But **imagine having 4 steps and 500 books**.
+How could we deal with this?
 
-Let's express it more compactly with a shell script (Bash). Let's call it `script.sh`:
+As a first idea we could express the workflow with a shell script. Let's call it `script.sh` (we could do this with a python script too):
 ```{code-block} bash
 ---
 emphasize-lines: 4
@@ -133,11 +69,9 @@ $ bash script.sh
 ```
 
 This is still **imperative style**: we tell the script to run these
-steps in precisely this order.  We can do it on many files, but if we
-need to re-run just one file, it's a bit of work.
+steps in precisely this order.  
 
 
-````{discussion}
 - What are the advantages of this solution compared to processing all one by one?
 - Is the scripted solution reproducible?
 - Imagine adding more steps to the analysis and imagine the steps being time consuming. What problems do you anticipate
@@ -158,14 +92,21 @@ need to re-run just one file, it's a bit of work.
 
 ---
 
-## Solution 4: Using [Snakemake](https://snakemake.readthedocs.io/en/stable/index.html)
+## Workflow tools
+
+Sometimes it may be helpful to go from imperative to declarative style. Rather than saying "do this and then that" we describe dependencies but we let
+the tool figure out the series of steps to produce results (targets). A workflow file 
+
+
+
+### Example tool: [Snakemake](https://snakemake.readthedocs.io/en/stable/index.html)
 
 Snakemake is inspired by [GNU Make](https://www.gnu.org/software/make/),
 but based on Python and is more general and has easier syntax.
 
 ---
 
-## Exercise
+## Exercise - demo
 
 ````{prereq} Exercise preparation
 The exercise (below) and pre-exercise discussion uses a simple
@@ -232,8 +173,7 @@ rule make_plot:
     shell: 'python {input.script} --data-file {input.book} --plot-file {output}'
 ```
 
-Snakemake uses **declarative style**: we describe dependencies but we let
-Snakemake figure out the series of steps to produce results (targets).
+We can see that Snakemake uses **declarative style**:
 Snakefiles contain rules that relate targets (`output`) to dependencies
 (`input`) and commands (`shell`).