Adding presentation

larsvilhuber · Jun 30, 2024 · 94bd393 · 94bd393
1 parent fcc5bef
commit 94bd393
Show file tree

Hide file tree

Showing 20 changed files with 2,620 additions and 0 deletions.
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
@@ -34,6 +34,17 @@ jobs:
     - name: Build the book
       run: |
         jupyter-book build .
+    # Let's do the Quarto
+    - name: Set up Quarto
+      uses: quarto-dev/quarto-actions/setup@v2
+    - name: Render Quarto Project
+      uses: quarto-dev/quarto-actions/render@v2
+      with:
+        path: ./presentation
+    - name: Move outputs around
+      run: |
+        mv presentation _build/html/presentation
+
     # Book is now in _build/html
     - name: prepare GitHub Pages action
       uses: actions/[email protected]    
@@ -50,6 +61,7 @@ jobs:
   #       github_token: ${{ secrets.GITHUB_TOKEN }}
   #       publish_dir: ./_build/html
 
+
   publish:
     needs: deploy-book
 

diff --git a/00-targets.md b/00-targets.md
@@ -0,0 +1,13 @@
+# Targets
+
+We want to check 
+
+::: {.incremental}
+
+- that your code runs without problem, after all the debugging.
+- that your code runs without manual intervention.
+- that your code generates a log file that you can inspect, and that you could share with others.
+- that it will run on somebody else's computer
+- that it actually produces all the outputs
+
+:::
diff --git a/_toc.yml b/_toc.yml
@@ -6,6 +6,7 @@ root: index
 parts:
   - caption: Simple ways to test replication packages
     chapters: 
+    - file: 00-targets
     - file: 01-run_it_again
     - file: 02-hands_off_running
     - file: 03-automatically_saving_figures

diff --git a/index.md b/index.md
@@ -3,8 +3,11 @@
 
 This document describes a few *possible* steps to self-check replication packages before submitting them to a journal. It is not meant to be exhaustive, and it is not meant to be prescriptive. There are many ways to construct a replication package, and many more to check that it works.
 
+## Computational Empathy
 The key ingredient is what I call "**computational empathy**" - thinking about what an unknown person attempting to reproduce the results in your paper might face, what they might know and assume, and more importantly, what they might not know or know to assume. While the replication package might very well run on your computer, that is by no means evidence that it will run on someone else's computer. 
 
+## Prerequisites
+
 In what follows, we will assume that the replicator satisfies the following conditions:
 
 - they are familiar with their own operating system

diff --git a/presentation/.gitignore b/presentation/.gitignore
@@ -0,0 +1 @@
+/.quarto/
diff --git a/presentation/01-run_it_again.md b/presentation/01-run_it_again.md
@@ -0,0 +1,37 @@
+# Run it all again
+
+The very first test is that your code must run, beginning to end, top to bottom, without error, and ideally without any user intervention. This should in principle (re)create all figures, tables, and numbers you include in your paper. 
+
+## TL;DR
+
+This is pretty much the most basic test of reproducibility. If you cannot run your code, you cannot reproduce your results, nor can anybody else. So just re-run the code.
+
+## Exceptions
+
+## Code runs for a very long time
+
+What happens when some of these re-runs are very long? See later in this chapter for how to handle this.
+
+## Making the code run takes YOU a very long time
+
+While the code, once set to run, can do so on its own, *you* might need to spend a lot of time getting all the various pieces to run. 
+
+---
+
+*This should be a warning sign:* if it takes you a long time to get it to run, or to manually reproduce the results, it might take others even longer. 
+
+---
+
+Furthermore, it may suggest that you haven't been able to re-run your own code very often, which can be correlated with fragility or even lack of reproducibility. 
+
+## Takeaways
+
+::: {.incremental}
+
+- ✅ your code runs without problem, after all the debugging.
+- ❓your code runs without manual intervention.
+- ❓your code generates a log file that you can inspect, and that you could share with others.
+- ❓it will run on somebody else's computer
+- ❓it actually produces all the outputs
+
+:::
diff --git a/presentation/02-00-intermezzo.md b/presentation/02-00-intermezzo.md
@@ -0,0 +1,23 @@
+# Why is this not enough?
+
+
+## Does your code run without manual intervention?
+
+Automation and robustness checks, as well as efficiency.
+
+## Can you provide evidence that you ran it?
+
+Generating a log file means that you can inspect it, and you can share it with others. Also helps in debugging, for you and others.
+
+## Will it run on somebody else's computer?
+
+Running it again does not help:
+
+::: {.incremental}
+  - because it does not guarantee that somebody else has all the software (including packages!)
+  - because it does not guarantee that all the directories for input or output are there
+  - because many intermediate files might be present that are not in the replication package
+  - because you might have run things out of sequence, or relied on previously generated files in ways that won't work for others
+  - because some outputs might be present from test runs, but actually fail in this run
+
+:::
diff --git a/presentation/02-hands_off_running.md b/presentation/02-hands_off_running.md
@@ -0,0 +1,261 @@
+# Hands-off running: Creating a controller script
+
+Did it take you a long time to run everything again?
+
+![⏳](https://c.tenor.com/4qs0klfg8nMAAAAC/tenor.gif)
+
+# Let's ramp it up a bit. 
+
+Your code must run, beginning to end, top to bottom, without error, and without any user intervention. This should in principle (re)create all figures, tables, and numbers you include in your paper. 
+
+
+::: {.notes}
+We have seen users who appear to highlight code and to run it interactively, in pieces, using the program file as a kind of notepad. This is not reproducible, and should be avoided. It is fine for debugging.
+:::
+
+## TL;DR
+
+- Create a "main" file that runs all the other files in the correct order.
+- Run this file, without user intervention.
+- It should run without error.
+
+## Creating a main or master script
+
+In order to be able to enable "hands-off running", the main script is key. I will show here a few simple examples for single-software replication packages. We will discuss more complex examples in one of the next chapters.
+
+## Examples
+
+::: {.panel-tabset}
+
+
+### Stata
+
+```stata
+* main.do
+* This is a simple example of a main file in Stata
+* It runs all the other files in the correct order
+
+* Set the root directory
+
+global rootdir : pwd
+
+* Run the data preparation file
+do $rootdir/01_data_prep.do
+
+* Run the analysis file
+do $rootdir/02_analysis.do
+
+* Run the table file
+do $rootdir/03_tables.do
+
+* Run the figure file
+do $rootdir/04_figures.do
+
+* Run the appendix file
+do $rootdir/05_appendix.do
+```
+The use of `do` (instead of `run` or even `capture run`) is best, as it will show the code that is being run, and is thus more transparent to you and the future replicator.
+
+
+Run this using the [right-click method](https://labordynamicsinstitute.github.io/ldilab-manual/96-02-running-stata-code.html#step-6-run-the-code) (Windows) or from the terminal (macOS, Linux): 
+
+```bash
+cd /where/my/code/is
+stata-mp -b do main.do
+```
+where `stata-mp` should be replaced with `stata` or `stata-se` depending on your licensed version.
+
+
+
+### R
+
+```r
+# main.R
+# This is a simple example of a main file in R
+# It runs all the other files in the correct order
+
+# Set the root directory
+# rootdir <- getwd()
+# or if you are using Rproj files or git
+rootdir <- here::here()
+
+# Run the data preparation file
+source(file.path(rootdir, "01_data_prep.R"), echo = TRUE)
+
+# Run the analysis file
+source(file.path(rootdir, "02_analysis.R"), echo = TRUE)
+
+# Run the table file
+source(file.path(rootdir, "03_tables.R"), echo = TRUE)
+
+# Run the figure file
+source(file.path(rootdir, "04_figures.R"), echo = TRUE)
+
+# Run the appendix file
+source(file.path(rootdir, "05_appendix.R"), echo = TRUE)
+```
+The use of `echo=TRUE` is best, as it will show the code that is being run, and is thus more transparent to you and the future replicator.
+
+
+Run this using the [terminal method](https://labordynamicsinstitute.github.io/ldilab-manual/96-12-running-r-code.html) in Rstudio for any platform, or from the terminal (macOS, Linux): 
+
+```bash
+cd /where/my/code/is
+R CMD BATCH main.R
+```
+
+Do not use `Rscript`, as it will not generate enough output! On Windows, under `cmd.exe` or Powershell, you may need to adjust `R` to be `R.exe` if it is in your `%PATH%` or the full path to `R.exe` if it is not (this is automatically set for you in Rstudio).
+
+
+### Python
+
+```python
+# main.py
+# This is a simple example of a main file in Python
+# It runs all the other files in the correct order
+
+# Set the root directory
+# rootdir = os.getcwd()
+# or better
+rootdir = os.path.dirname(os.path.realpath(__file__))
+
+# Run the data preparation file
+exec(open(os.path.join(rootdir, "01_data_prep.py")).read())
+
+# Run the analysis file
+exec(open(os.path.join(rootdir, "02_analysis.py")).read())
+
+# Run the table file
+exec(open(os.path.join(rootdir, "03_tables.py")).read())
+
+# Run the figure file
+exec(open(os.path.join(rootdir, "04_figures.py")).read())
+
+# Run the appendix file
+exec(open(os.path.join(rootdir, "05_appendix.py")).read())
+```
+
+Run this from your favorite IDE or from a terminal:
+
+```bash
+cd /where/my/code/is
+python main.py
+```
+
+
+### MATLAB
+
+```matlab
+% main.m
+% This is a simple example of a main file in MATLAB
+% It runs all the other files in the correct order
+
+% Set the root directory
+rootdir = pwd;
+
+% Run the data preparation file
+run(fullfile(rootdir, '01_data_prep.m'))
+
+% Run the analysis file
+run(fullfile(rootdir, '02_analysis.m'))
+
+% Run the table file
+run(fullfile(rootdir, '03_tables.m'))
+
+% Run the figure file
+run(fullfile(rootdir, '04_figures.m'))
+
+% Run the appendix file
+run(fullfile(rootdir, '05_appendix.m'))
+```
+
+Run this script, and it should run all the other ones. Note that there are various other ways to achieve a similar goal, for instance, by treating each MATLAB file as a function. 
+
+
+### Julia
+
+In Julia, we can do something similar:
+
+```julia
+# This is a simple example of a main file in Julia
+# It runs all the other files in the correct order
+
+# Set the root directory
+rootdir = pwd()
+
+# Run the data preparation file
+include(joinpath(rootdir, "01_data_prep.jl"))
+
+# Run the analysis file
+include(joinpath(rootdir, "02_analysis.jl"))
+
+# Run the table file
+include(joinpath(rootdir, "03_tables.jl"))
+
+# Run the figure file
+include(joinpath(rootdir, "04_figures.jl"))
+
+# Run the appendix file
+include(joinpath(rootdir, "05_appendix.jl"))
+```
+
+Run this from your favorite IDE or from a terminal:
+
+```bash
+cd /where/my/code/is
+julia main.jl
+```
+
+
+### Bash[^bash]
+
+[^bash]: Bash is a cross-platform terminal interpreter that many users may have encountered if using Git on Windows ("Git Bash"). It is also installed by default on macOS and Linux. It can be used to run command line versions of most statistical software, and is thus a good candidate for a main script. Note that it does introduce an additional dependency - the replicator now needs to have Bash installed, and it is not entirely platform agnostic when calling other software, as those calls may be different on different platforms, though that is a problem afflicting any multi-software main script. In particular, on most Windows machines, the statistical software is not in the `%PATH%` by default, and thus may need to be called with the full path to the executable.
+
+
+
+```bash
+# main.bash
+# Run the data preparation file
+# Example for calling Stata
+stata-mp -b do "01_data_prep.do"
+# Run the analysis file
+python 02_analysis.py
+# Run the table file
+Rscript 03_tables.R
+# Run the appendix file
+# Here, we use MATLAB. Running MATLAB is *never* platform-independent. 
+# Linux:
+matlab -nodisplay -r "addpath(genpath('.')); 05_appendix" 
+# Windows:
+#start matlab -nosplash  -minimize -r  "addpath(genpath('.')); 05_appendix"
+```
+
+:::
+
+
+## Takeaways
+
+:::: {.columns}
+
+::: {.column}
+
+### What this does 
+
+This ensures
+
+- that your code runs without problem, after all the debugging.
+- that your code runs without manual intervention.
+:::
+
+::: {.column .smaller }
+
+### What this does not do
+
+
+- that your code generates a log file that you can inspect, and that you could share with others.
+- that it will run on somebody else's computer
+- that it actually produces all the outputs
+
+:::
+
+::::