This guide provides instructions on editing the Docker images used to run methods
,
metrics
, and load datasets
for the Open Problems benchmarking infrastructure.
Note, all images must comply to the AWS SageMaker Custom Image Specifications.
- About Docker images
- Available images
- Adding a package to the available images
- Adding new images
- Building Docker images locally
- Building Docker images through GitHub Actions workflows
- Pulling images from the ECR to your local machine
- Running Docker images locally
- Dockerfile Reference
- Documentation from Docker on how to write Dockerfiles
- SageMaker Studio Custom Image Samples
- Example images from AWS designed for compatibility with SageMaker
By default, all methods, metrics, and dataset loaders run in the openproblems
docker
image. If you require additional dependencies, you can either add them to an existing
docker image, or if this is not possible due to conflicts, add a new one.
To define which image is to be used in a method or metric, simply set the image
parameter in the method decorator to match the name of the folder containing the
Dockerfile (e.g., image="openproblems-r-base"
).
Our base image. Do not add dependencies unless you know what you are doing.
Our base R image. Do not add dependencies unless you know what you are doing.
Our R image that accepts additional dependencies.
To add R packages (CRAN or Bioc), add them to r_requirements.txt
. Syntax is dictated
by renv
.
To add Python packages (PyPi or Github), add them to requirements.txt
. Syntax is
dictated by pip
.
To add Python packages (PyPi or Github), add them to requirements.txt
. Syntax is
dictated by pip
.
Most packages should be able to be added in the Open Problems by editing one of the available images listed above. If there are conflicting dependencies between the package you would like to add and the packages already in the available images, follow the Adding new images steps below.
Assuming there are no conflicting dependencies, you can simply amend the relevant
requirements.txt
file in the directory for the Docker image you would like to edit.
- Select a Docker image to edit. If you're adding a Python package, start with
openproblems-python-extras
. If you're adding an R package, start with theopenproblems-r-extras
. - Edit the relevant
requirements.txt
file.- Adding an R package:
- Edit the
r_requirements.txt
file. - The syntax to add a package is defined by renv
.
- Packages from Bioconductor:
bioc::packagename
- Packages from CRAN:
packagename@<version-tag>
- Packages from Git:
username/packagename
- Packages from Bioconductor:
- More complex package installation will require editing the
Dockerfile
.
- Edit the
- Adding a Python package:
- Edit the
requirements.txt
file. - The syntax to add a package is defined by
pip
- Packages from PyPI:
packagename==version
- Packages from Git:
git+https://github.com/username/repositoryname
- Packages from PyPI:
- More complex package installation will require editing the
Dockerfile
.
- Edit the
- Adding an R package:
- Add the
packagename
to theREADME.md
file in the directory specifying the Docker image. This helps keep track of which packages and versions are installed in each Docker image. - Commit your changes to the Docker image and push to your fork following the instructions in the Contributing Guide .
To add a new image, create a new folder containing the following files:
Dockerfile
README.md
requirements.txt
(optional)r_requirements.txt
(optional)
The easiest way to do this is to copy the openproblems-python-extras
or
openproblems-r-extras
folder.
If you have Docker installed, you can build containers locally for prototyping. For
example, to install the openproblems
base container, you can run the following.
docker build -f docker/openproblems/Dockerfile -t singlecellopenproblems/openproblems .
or to update all available Docker images, updating only when necessary:
cd workflow && snakemake -j 10 docker
or if you wish to override the automatic change detection,
cd workflow && snakemake -j 10 docker_build
Docker images are built by the run_benchmarks
GitHub Actions workflow on both the base
repository and on forks. As long as you have AWS secrets configured properly for your
repository (see our Contributing
Guide),
these images will be uploaded to Amazon Web Services Elastic Container
Registry (ECR). You can then download the image locally or
attach to AWS SageMaker Studio.
Once your Run Benchmark has completed successfully, you should see a pane in the GitHub Actions tab of your fork that looks like this:
If that workflow failed, you should look at the workflow logs to find the error.
You can find your successfully uploaded images on the ECR. To navigate to the ECR,
search the AWS console for "ECR" and click on "Repositories" and then click on
openproblems
. You should also see a nextflow
repository that's used for your
benchmarking backend, but you can ignore that for now.
As you can see below, images uploaded to the ECR have Image Tags in the following format
openproblems:[first 6 characters of username]-[branch name]-[image name]
. For example,
danielStrobl
recently pushed his batch-integration
branch containing a
openproblems-python37-scgen
image. This is converted to an Image Tag
daniel-batch-integration-openproblems-python37-scgen
.
To pull images from the ECR using docker pull
, first download and setup the
amazon-ecr-credential-helper
using the same AWS secrets that you used to set up your fork repository. With that set
up you can use the following command to pull the image:
docker pull <aws_account_id>.dkr.ecr.us-west-2.amazonaws.com/openproblems:<Image Tag>
If you would like to attach this image to AWS SageMaker, you can follow our SageMaker and ECR tutorial.
You can also pull base images from DockerHub:
docker pull singlecellopenproblems/openproblems-python-extras:latest
To run Docker images on your local machine, you must have docker
installed. Follow the
Docker guide to Install Docker.
Once you've either built Docker images locally or pulled them from ECR or the
singlecellopenproblems
DockerHub, you can see
installed images using docker images
.
> docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
singlecellopenproblems/openproblems-python-extras latest f86e1c5ce9d0 14 hours ago 3.94GB
singlecellopenproblems/openproblems-r-base latest f8908c9fb387 21 hours ago 6.36GB
singlecellopenproblems/openproblems-r-extras latest 7e15120bb7ce 5 days ago 4.89GB
singlecellopenproblems/openproblems latest 14974cbd2f58 5 days ago 2.1GB
490915662541.dkr.ecr.us-west-2.amazonaws.com/openproblems batch_integration_docker-openproblems 3a1ce37e85f2 6 days ago 2.06GB
You can then run commands within a docker container using docker run
. Consult the
Docker documentation to
learn more about the run
command.
cd openproblems
docker run \
-v $(pwd):/usr/src/singlecellopenproblems -v /tmp:/tmp \
-it singlecellopenproblems/openproblems-python-extras bash
You may also specify the docker image by its ID, rather than its name:
cd openproblems
docker run -v $(pwd):/usr/src/singlecellopenproblems -v /tmp:/tmp -it 90a9110c7d69 bash