Docker delivers software in packages called containers, which can be run locally or on servers. Containers are isolated from one another and bundle their own software, libraries, and configuration files (https://docker.com). They are more robust than virtual environments, but more lightweight and efficient than virtual machines. You can read more about that in this Medium article.
Containers are created from layered read-only templates called images by adding a read-write layer on top. There are three ways to add a new image to your machine.
- Build it from a Dockerfile.
Dockerfiles can be created in any text editor and are pretty intuitive to write. Good starting points are the DeepLabCut container and the Dockerfile in the jupyter folder here. After you've created your Dockerfile all you need to do is rundocker build -t image_name .
(note the dot in the end). This will add everything in your current directory to the container, so if you want it empty, put the Dockerfile in an empty folder. - Pull it from dockerhub.
You can rundocker pull image_name
or just skip this step altogether, Docker will pull automatically when you try to run a container from an image that isn't already on the machine. - Create it from a container.
It's important to remember that containers are supposed to be temporary and after a system reboot all information that only existed in containers will disappear, so taking care of your data and saving the environment regularly is good practice! In order to save a container just rundocker commit container_name image_name:tag
. You can also push it to dockerhub withdocker push image_name
. Note that when you commit a container, all processes running inside it are paused!
After you've got your image in order to start working you just need to run the container. The command for that is docker run --name=container_name image_name
. Here are a few useful options:
-it
runs the container interactively and opens the shell,--rm
destroys the container as soon as you exit,-d
runs it in detached mode, without taking up the terminal,-w
sets the working directory in the container.
One option I would like to talk about in a bit more detail is -v
. As I've already mentioned, containers are temporary and if your data is only saved in a container it could very easily get lost. In addition, by default containers are completely isolated from the host machine, so you can't just save important things there instead. The solution to that problem is mounting data volumes. That basically means synchronising a directory in the container with a directory on your computer, or, alternatively, with an abstract data space that can then be accessed by other containers. That way everything in that folder is accessible both from the container and from the host machine and nothing happens to the data if the container is removed. In order to set this up just use the -v /host/path:/container/path
option. Note that the paths need to be absolute if you want to use a folder on your machine. Using -v abstract:/container_path
will create a data volume named abstract that is not connected to your host system. If you want to mount multiple data volumes, just repeat the option several times: -v /host/path/1:/container/path/1 -v /host/path/2:/container/path/2
.
When you are done working with a container, you can print exit
to stop it or press Ctrl+p, Ctrl+q
to quit the terminal without interrupting the process. To come back to it later, use docker attach container_name
. If the container has been stopped, you'll need to run docker start container_name
first. You can also run another process in an existing container with docker exec
, for example docker exec -it container_name bash
.
You can always see the list of all images on the machine with docker images
and the list of all containers with docker ps -a
. Please don't forget to remove the containers you are not using with docker rm container_name
!
In order to run a jupyter lab in your container you only really need to install jupyter and then run
ssh -L <host port>:localhost:<remote port> user@remote
on your computer,docker run -it -p <remote port>:<container port> --name <container name>
on the server,jupyter lab --ip 0.0.0.0 --port <container port> --allow-root
inside the container.
However, there are many ways to improve that process. Here is a nice tutorial and
here is the page of Jupyter Docker Stacks, images specifically designed with jupyter and data science in mind.
Alternatively, you can use the Dockerfiles in the jupyter_cuda_torch and jupyter_cuda_tf folders. They are short and easy to understand and to modify even if you are very new to Docker. Building from that Dockerfile will create an image with basic data science python libraries as well as git, nano and unzip already installed and a working combination of your DL library of choice and CUDA drivers. A container run from that image will automatically start a jupyter notebook at port 8889, so you would only need to run the commands on your computer and on the server with <container port>=8889
.
Note that by default the image will be build with the latest versions of all libraries, but you can specify them in the Dockerfile if needed. You can create multiple images with different library versions for different purposes. Just remember that images have a layered structure that is utilized to make storage efficient. That means that if you want to create multiple images that differ by a few Dockerfile lines, those lines should be closer to the end of the file.
One more thing to remember is that if you are planning to use GPUs, you should use nvidia-docker
instead of regular docker
. All the commands stay the same and you can even keep using images created in regular docker.
Here is an example use pulling everything together:
nvidia-docker run --name=jupyter -it -p <remote port>:8889 -v ~/data:/home -w /home -d jupyter_base
runs a GPU-enabled container namedjupyter
on top of thejupyter_base
image in detached mode (I am assuming the image has the instructions to start a notebook at port 8889 automatically). It connects the~/data
folder on the host machine to the/home
folder on the container in order to keep the important data safe and makes/home
the working directory in the container.docker exec -it jupyter bash
opens the shell of that container without interfering with the notebook. You can run another process and typeexit
when you are done, it will not stop the notebook.docker attach -it jupyter
connects you to the output of the notebook server. Here in order to quit without stopping the process you need to pressCtrl+p, Ctrl+q
.
D. Merkel: Docker: lightweight linux containers for consistent development and deployment Linux J., 2014 (2014), p. 2