This is an example project repository to illustrate what a reproducible analysis might look like as discussed in more detail in the Reproducibility in Cancer Informatics course.
It can be used as a template or otherwise borrowed from.
This example analysis:
- Downloads data from refine.bio using the refine.bio python API client.
- Identifies the top 90th percentile variant genes from the set.
- Creates and saves a heatmap from those genes.
It also has its own Docker image and GitHub actions to aid reproducibility.
Table of Contents generated with DocToc
- Requirements
- How to run the analysis
- make_heatmap.ipynb
- conda
- Docker
- Github actions
- Styling with Black
To run this analysis you will need git
and Docker
installed on your computer.
These are two platforms that are very useful for reproducibility so they will be useful for you far beyond this repository.
To re-run this analysis within its Docker image, open up your Terminal/Command Prompt.
- First you can obtain a local copy of this repository by
git clone
-ing it.
git clone https://github.com/jhudsl/reproducible-python-example.git
- Now navigate to the top of this repository.
cd reproducible-python-example
- Use the following command to run the analysis:
docker run \
--mount type=bind,target=/home/jovyan/work,source=$PWD \
jhudsl/reproducible-python \
jupyter nbconvert --execute work/make_heatmap.ipynb --to notebook --inplace
The dataset used by this analysis is downloaded already processed and quantile normalized from refine.bio using their API. It is RNA-seq data from 19 acute myeloid leukemia (AML) mice models.
Two directories are created by this analysis and hold the output:
plots/
- contains the heatmap png: aml_heatmap.png
results/
- contains the TSV file list of most variant genes: top_90_var_genes.tsv
Package management for this project is done with conda. If you don't have conda, you will need to install that first. This article is a great short introduction to conda. You can create your conda environment by using this command at the top of your repository:
conda env create --file environment.yml
Then you can activate your conda environment using this command:
conda activate reproducible-python
Now you can start up JupyterLab again using this command:
jupyter lab
Working from JuptyerLab, use the "Reproducible Python" Kernel. Develop and install new packages as you need them, to update the conda environment with the new packages you installed, run this command:
conda env export --from-history
Be sure to add the environment.yml
file to any commits and pull requests since that's what has stored the package changes to your environment!
With your current directory being the top of this repository, run this command in your Terminal:
docker run --rm -v $(pwd):/home/jovyan/work -e JUPYTER_ENABLE_LAB=yes -p 8888:8888 jhudsl/reproducible-python
Then navigate to the port that the output tells you (you may have to try both links, sometimes only one of them works). This command will pull the most recent docker image from Dockerhub if you do not have it locally.
If you prefer to build the image locally, or have otherwise modified the Dockerfile and want to test if it builds, you can run this command from the top of the repository:
docker build -f docker/Dockerfile . -t jhudsl/reproducible-python
Running docker ps
should show you the jhudsl/reproducible-python
listed with your images
There are two main GitHub actions in this repository:
docker-management.yml
- Tests the building of the docker image upon changes to theDockerfile
being added to a pull request.run-py-notebook.yml
- Re-runs the analysis by runningmake_heatmap.ipynb
within the docker image (using the command described above).
Both GitHub actions have the option to be run manually.
The Docker management GitHub actions also has the option to push the re-built Docker image to Dockerhub by setting dockerhubpush
to true
.
The Docker container and conda environment are equipped with python black for styling purposes. To run on each python file here, use these commands:
python -m black make_heatmap.ipynb
python -m black util/color_key.py