Skip to content

Commit

Permalink
Adding presentation
Browse files Browse the repository at this point in the history
  • Loading branch information
larsvilhuber committed Jun 30, 2024
1 parent fcc5bef commit 94bd393
Show file tree
Hide file tree
Showing 20 changed files with 2,620 additions and 0 deletions.
12 changes: 12 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,17 @@ jobs:
- name: Build the book
run: |
jupyter-book build .
# Let's do the Quarto
- name: Set up Quarto
uses: quarto-dev/quarto-actions/setup@v2
- name: Render Quarto Project
uses: quarto-dev/quarto-actions/render@v2
with:
path: ./presentation
- name: Move outputs around
run: |
mv presentation _build/html/presentation
# Book is now in _build/html
- name: prepare GitHub Pages action
uses: actions/[email protected]
Expand All @@ -50,6 +61,7 @@ jobs:
# github_token: ${{ secrets.GITHUB_TOKEN }}
# publish_dir: ./_build/html


publish:
needs: deploy-book

Expand Down
13 changes: 13 additions & 0 deletions 00-targets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Targets

We want to check

::: {.incremental}

- that your code runs without problem, after all the debugging.
- that your code runs without manual intervention.
- that your code generates a log file that you can inspect, and that you could share with others.
- that it will run on somebody else's computer
- that it actually produces all the outputs

:::
1 change: 1 addition & 0 deletions _toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ root: index
parts:
- caption: Simple ways to test replication packages
chapters:
- file: 00-targets
- file: 01-run_it_again
- file: 02-hands_off_running
- file: 03-automatically_saving_figures
Expand Down
3 changes: 3 additions & 0 deletions index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,11 @@

This document describes a few *possible* steps to self-check replication packages before submitting them to a journal. It is not meant to be exhaustive, and it is not meant to be prescriptive. There are many ways to construct a replication package, and many more to check that it works.

## Computational Empathy
The key ingredient is what I call "**computational empathy**" - thinking about what an unknown person attempting to reproduce the results in your paper might face, what they might know and assume, and more importantly, what they might not know or know to assume. While the replication package might very well run on your computer, that is by no means evidence that it will run on someone else's computer.

## Prerequisites

In what follows, we will assume that the replicator satisfies the following conditions:

- they are familiar with their own operating system
Expand Down
1 change: 1 addition & 0 deletions presentation/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/.quarto/
37 changes: 37 additions & 0 deletions presentation/01-run_it_again.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Run it all again

The very first test is that your code must run, beginning to end, top to bottom, without error, and ideally without any user intervention. This should in principle (re)create all figures, tables, and numbers you include in your paper.

## TL;DR

This is pretty much the most basic test of reproducibility. If you cannot run your code, you cannot reproduce your results, nor can anybody else. So just re-run the code.

## Exceptions

## Code runs for a very long time

What happens when some of these re-runs are very long? See later in this chapter for how to handle this.

## Making the code run takes YOU a very long time

While the code, once set to run, can do so on its own, *you* might need to spend a lot of time getting all the various pieces to run.

---

*This should be a warning sign:* if it takes you a long time to get it to run, or to manually reproduce the results, it might take others even longer.

---

Furthermore, it may suggest that you haven't been able to re-run your own code very often, which can be correlated with fragility or even lack of reproducibility.

## Takeaways

::: {.incremental}

- ✅ your code runs without problem, after all the debugging.
- ❓your code runs without manual intervention.
- ❓your code generates a log file that you can inspect, and that you could share with others.
- ❓it will run on somebody else's computer
- ❓it actually produces all the outputs

:::
23 changes: 23 additions & 0 deletions presentation/02-00-intermezzo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Why is this not enough?


## Does your code run without manual intervention?

Automation and robustness checks, as well as efficiency.

## Can you provide evidence that you ran it?

Generating a log file means that you can inspect it, and you can share it with others. Also helps in debugging, for you and others.

## Will it run on somebody else's computer?

Running it again does not help:

::: {.incremental}
- because it does not guarantee that somebody else has all the software (including packages!)
- because it does not guarantee that all the directories for input or output are there
- because many intermediate files might be present that are not in the replication package
- because you might have run things out of sequence, or relied on previously generated files in ways that won't work for others
- because some outputs might be present from test runs, but actually fail in this run

:::
261 changes: 261 additions & 0 deletions presentation/02-hands_off_running.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,261 @@
# Hands-off running: Creating a controller script

Did it take you a long time to run everything again?

![](https://c.tenor.com/4qs0klfg8nMAAAAC/tenor.gif)

# Let's ramp it up a bit.

Your code must run, beginning to end, top to bottom, without error, and without any user intervention. This should in principle (re)create all figures, tables, and numbers you include in your paper.


::: {.notes}
We have seen users who appear to highlight code and to run it interactively, in pieces, using the program file as a kind of notepad. This is not reproducible, and should be avoided. It is fine for debugging.
:::

## TL;DR

- Create a "main" file that runs all the other files in the correct order.
- Run this file, without user intervention.
- It should run without error.

## Creating a main or master script

In order to be able to enable "hands-off running", the main script is key. I will show here a few simple examples for single-software replication packages. We will discuss more complex examples in one of the next chapters.

## Examples

::: {.panel-tabset}


### Stata

```stata
* main.do
* This is a simple example of a main file in Stata
* It runs all the other files in the correct order
* Set the root directory
global rootdir : pwd
* Run the data preparation file
do $rootdir/01_data_prep.do
* Run the analysis file
do $rootdir/02_analysis.do
* Run the table file
do $rootdir/03_tables.do
* Run the figure file
do $rootdir/04_figures.do
* Run the appendix file
do $rootdir/05_appendix.do
```
The use of `do` (instead of `run` or even `capture run`) is best, as it will show the code that is being run, and is thus more transparent to you and the future replicator.


Run this using the [right-click method](https://labordynamicsinstitute.github.io/ldilab-manual/96-02-running-stata-code.html#step-6-run-the-code) (Windows) or from the terminal (macOS, Linux):

```bash
cd /where/my/code/is
stata-mp -b do main.do
```
where `stata-mp` should be replaced with `stata` or `stata-se` depending on your licensed version.



### R

```r
# main.R
# This is a simple example of a main file in R
# It runs all the other files in the correct order

# Set the root directory
# rootdir <- getwd()
# or if you are using Rproj files or git
rootdir <- here::here()

# Run the data preparation file
source(file.path(rootdir, "01_data_prep.R"), echo = TRUE)

# Run the analysis file
source(file.path(rootdir, "02_analysis.R"), echo = TRUE)

# Run the table file
source(file.path(rootdir, "03_tables.R"), echo = TRUE)

# Run the figure file
source(file.path(rootdir, "04_figures.R"), echo = TRUE)

# Run the appendix file
source(file.path(rootdir, "05_appendix.R"), echo = TRUE)
```
The use of `echo=TRUE` is best, as it will show the code that is being run, and is thus more transparent to you and the future replicator.


Run this using the [terminal method](https://labordynamicsinstitute.github.io/ldilab-manual/96-12-running-r-code.html) in Rstudio for any platform, or from the terminal (macOS, Linux):

```bash
cd /where/my/code/is
R CMD BATCH main.R
```

Do not use `Rscript`, as it will not generate enough output! On Windows, under `cmd.exe` or Powershell, you may need to adjust `R` to be `R.exe` if it is in your `%PATH%` or the full path to `R.exe` if it is not (this is automatically set for you in Rstudio).


### Python

```python
# main.py
# This is a simple example of a main file in Python
# It runs all the other files in the correct order

# Set the root directory
# rootdir = os.getcwd()
# or better
rootdir = os.path.dirname(os.path.realpath(__file__))

# Run the data preparation file
exec(open(os.path.join(rootdir, "01_data_prep.py")).read())

# Run the analysis file
exec(open(os.path.join(rootdir, "02_analysis.py")).read())

# Run the table file
exec(open(os.path.join(rootdir, "03_tables.py")).read())

# Run the figure file
exec(open(os.path.join(rootdir, "04_figures.py")).read())

# Run the appendix file
exec(open(os.path.join(rootdir, "05_appendix.py")).read())
```

Run this from your favorite IDE or from a terminal:

```bash
cd /where/my/code/is
python main.py
```


### MATLAB

```matlab
% main.m
% This is a simple example of a main file in MATLAB
% It runs all the other files in the correct order
% Set the root directory
rootdir = pwd;
% Run the data preparation file
run(fullfile(rootdir, '01_data_prep.m'))
% Run the analysis file
run(fullfile(rootdir, '02_analysis.m'))
% Run the table file
run(fullfile(rootdir, '03_tables.m'))
% Run the figure file
run(fullfile(rootdir, '04_figures.m'))
% Run the appendix file
run(fullfile(rootdir, '05_appendix.m'))
```

Run this script, and it should run all the other ones. Note that there are various other ways to achieve a similar goal, for instance, by treating each MATLAB file as a function.


### Julia

In Julia, we can do something similar:

```julia
# This is a simple example of a main file in Julia
# It runs all the other files in the correct order

# Set the root directory
rootdir = pwd()

# Run the data preparation file
include(joinpath(rootdir, "01_data_prep.jl"))

# Run the analysis file
include(joinpath(rootdir, "02_analysis.jl"))

# Run the table file
include(joinpath(rootdir, "03_tables.jl"))

# Run the figure file
include(joinpath(rootdir, "04_figures.jl"))

# Run the appendix file
include(joinpath(rootdir, "05_appendix.jl"))
```

Run this from your favorite IDE or from a terminal:

```bash
cd /where/my/code/is
julia main.jl
```


### Bash[^bash]

[^bash]: Bash is a cross-platform terminal interpreter that many users may have encountered if using Git on Windows ("Git Bash"). It is also installed by default on macOS and Linux. It can be used to run command line versions of most statistical software, and is thus a good candidate for a main script. Note that it does introduce an additional dependency - the replicator now needs to have Bash installed, and it is not entirely platform agnostic when calling other software, as those calls may be different on different platforms, though that is a problem afflicting any multi-software main script. In particular, on most Windows machines, the statistical software is not in the `%PATH%` by default, and thus may need to be called with the full path to the executable.



```bash
# main.bash
# Run the data preparation file
# Example for calling Stata
stata-mp -b do "01_data_prep.do"
# Run the analysis file
python 02_analysis.py
# Run the table file
Rscript 03_tables.R
# Run the appendix file
# Here, we use MATLAB. Running MATLAB is *never* platform-independent.
# Linux:
matlab -nodisplay -r "addpath(genpath('.')); 05_appendix"
# Windows:
#start matlab -nosplash -minimize -r "addpath(genpath('.')); 05_appendix"
```

:::


## Takeaways

:::: {.columns}

::: {.column}

### What this does

This ensures

- that your code runs without problem, after all the debugging.
- that your code runs without manual intervention.
:::

::: {.column .smaller }

### What this does not do


- that your code generates a log file that you can inspect, and that you could share with others.
- that it will run on somebody else's computer
- that it actually produces all the outputs

:::

::::
Loading

0 comments on commit 94bd393

Please sign in to comment.