In an ever-changing tech landscape, mastering Python-based language model (LLM) projects can be challenging, especially for beginners. To address this, I've created a tutorial that introduces Docker, a tool that bridges the gap between Python Virtual Environments and Virtual Machines. Unlike traditional virtual environments like Venv, Conda, or Poetry that operate within the same OS runtime, Docker provides a complete runtime environment, including the OS and system libraries, separate from the host's kernel. This unique approach results in higher isolation and efficiency. Docker, being more comprehensive than virtual environments yet lighter than full virtual machines, is ideal for ensuring consistent and isolated development across different systems. This guide aims to simplify your development process and offer a clear pathway through the complexities of AI programming, whether you're a newcomer or a seasoned developer.
1. Install Python:
- Special Note about Python Versions: I recommend using Python version 3.10 or higher for AI/LLM work. While most AI/LLM projects on GitHub use later versions of Python, it's always best to check the project's repo for specific version recommendations.
- Linux: Python is often pre-installed. Check using
python --version
. If not installed, use your package manager (e.g.,sudo apt-get install python3
). - Windows and Mac: Download and install Python from the official Python website.
2. Install Docker:
- Linux: Follow instructions from the official Docker documentation.
- Windows and Mac: Download Docker Desktop from the official Docker website and follow the setup wizard.
Docker offers the flexibility needed for LLM development. Below is a Dockerfile example for pyautogen, adaptable for other LLM projects.
Modify the Python version and base image as needed for your project. The provided example below will work for AutoGen but will require modification to work with other projects.
FROM python:3.11-slim-bookworm
# Update and install dependencies
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
software-properties-common sudo\
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Setup a non-root user 'autogen' with sudo access
RUN adduser --disabled-password --gecos '' autogen
RUN adduser autogen sudo
RUN echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
USER autogen
WORKDIR /home/autogen
# Uncomment the lines below and create .env file in your docker directory.
# COPY .env /home/autogen/
# ENV OPENAI_API_KEY=$(cat /home/autogen/.env)
# Install Python packages
RUN pip install --upgrade pip
# AutoGen Specific Install
RUN pip install pyautogen[teachable,lmm,retrievechat,mathchat,blendsearch] autogenra
RUN pip install numpy pandas matplotlib seaborn scikit-learn requests urllib3 nltk pillow pytest beautifulsoup4
# Expose port
EXPOSE 8081
# Start Command
CMD ["/bin/bash"]
Components to Retain or Modify for Different LLM Projects:
-
Base Image:
python:3.11-slim-bookworm
is chosen for its lightweight nature and the pre-installed Python 3.11. This Debian-based image is popular in LLM development for its stable package repositories and community support, making it an excellent default choice. you can find a list of potential images on DockerHub -
User Configuration: The Dockerfile creates a non-root user
autogen
with sudo privileges. This is a security best practice, preventing the running container from having unrestricted root access, which could be a security risk. -
Python Environment: The Dockerfile upgrades pip and installs a set of Python packages, including
pyautogen
with various options (teachable
,lmm
,retrievechat
,mathchat
,blendsearch
) andautogenra
. Depending on the specific LLM project you're working on, the packages will need to be adjusted. Always refer to the project's documentation for the required dependencies. This package selection is specific to the AutoGen project but can be adjusted for other LLMs. Always consult the specific LLM project's documentation for required dependencies. -
Environment Variable: To securely handle the OpenAI API key, you can use a
.env
file in your Docker setup. This file will contain your API key, and Docker can use it without directly including it in the Dockerfile. Here's how you can implement this:a. Create a .env File: In your project directory, create a file named
.env
.OPENAI_API_KEY=your_actual_openai_api_key
Replace
your_actual_openai_api_key
with your real OpenAI API key.b. Modify Dockerfile: Update your Dockerfile to copy the
.env
file into the container and set the environment variable:# Copy .env file containing the API key COPY .env /home/autogen/ # Set environment variables from .env file ENV OPENAI_API_KEY=$(cat /home/autogen/.env)
c. Add .env to .gitignore: To prevent accidentally pushing the
.env
file to a public repository, add.env
to your.gitignore
file:.env
This method ensures that your API key remains secure and isn't included in the Dockerfile, which might be shared or committed to a public repository. It also keeps the key out of your Docker image's layers, adding an extra layer of security.
-
Port Exposure: The
EXPOSE 8081
command is indicative. You might need to expose different ports based on what your application requires. -
Start Command: The
CMD ["/bin/bash"]
starts a bash shell. This can be changed to run your application directly, depending on your project's needs. we should provide an example of how to set this up.
When working with Docker, you might have a local codebase that you want to use inside your Docker container. Docker provides a simple way to achieve this through the use of the COPY
command in the Dockerfile or by mounting a volume when you run the container.
You can modify the Dockerfile to include the COPY
command, which copies files from your local file system into the Docker image. Here's an example:
# Copy local code to the container
COPY ./my_local_codebase /home/autogen/my_local_codebase
This line will copy the contents of the my_local_codebase
directory from your local system to the /home/autogen/my_local_codebase
directory inside the Docker container. Make sure to replace ./my_local_codebase
with the path to your actual local codebase.
For a more dynamic approach, especially if you anticipate making frequent changes to your local codebase, consider mounting a volume. This allows you to map a local directory to a directory inside the container, enabling real-time synchronization of files between your local environment and the Docker container.
Run the container with the following command to mount a volume:
sudo docker run -p 8080:8081 -v /path/to/local/codebase:/home/autogen/my_local_codebase -it --name {application_project_name} {image_name}
Replace /path/to/local/codebase
with the path to your local codebase directory. This setup ensures that any changes made in the local directory are immediately reflected inside the container.
For more complex setups or specific requirements, Docker offers a variety of configurations and options. To delve deeper into Docker's capabilities, I recommend exploring the official Docker documentation. This resource provides comprehensive guides and explanations for various Docker functionalities, including networking, storage, security, and more.
1. Building the Docker Image:
-
Run in your Dockerfile directory:
Note: the period at the end of the command is shorthand for Present working directory. If you want to have the image located somewhere else replace the "." with your directory desired patch
docker build -t {image_name} .
2. Running the Container:
-
Start and log into the container:
docker run -it --name {application_project_name} {image_name}
Note: if you need to map the exposed docker port to a local port on your device you can use this version of the command above. Note that the syntax is
-p <host_port>:<container_port>
docker run -p 8080:8081 -it --name {application_project_name} {image_name}
-
Activate the Python virtual environment. Guess what you don't need to. This process removes the need to use a python virtual environment as you are already isolated from the local OS. If we had not created the Docker container user autogen then yes using a python environment would be the way to go for application security
source /usr/src/app/autogen_env/bin/activate
3. Saving the State of the Container:
-
To save the state of your container, use the
docker commit
command locally (not in the docker container). This creates a new image of the container's current state, which you can use to resume work later without losing your progress.docker commit {application_project_name} {new_image_name}
4. Closing for the Day:
-
from inside of the container, exit the container:
exit
-
You should now be back in your local environment and you can stop the container:
docker stop {application_project_name}
5. Restarting Your Container:
To restart your container after saving its state with docker commit
, you can simply re-run the same commands
a. Start and log into the container:
docker start {application_project_name}
b. Attach to the running container:
sudo docker exec -it {application_project_name} bash
This process will resume your container with the state and data preserved from when you last committed the changes. Remember, using docker commit
before stopping your container is crucial for this process to work effectively.
- Change Python Version: Modify lines with
python3.10
to your desired version. - Adapt for Different LLM: Change
pip install pyautogen
to your LLM's package name.
-
View running containers:
docker ps -a
-
View Docker images:
docker images
-
Restart container setup:
docker stop my_container docker rm my_container docker rmi my_image:latest
Embracing D# Simplifying LLM Development: A Docker-Based Solution
In an ever-changing tech landscape, mastering Python-based language model (LLM) projects can be challenging, especially for beginners. To address this, I've created a tutorial that introduces Docker, a tool that bridges the gap between Python Virtual Environments and Virtual Machines. Unlike traditional virtual environments like Venv, Conda, or Poetry that operate within the same OS runtime, Docker provides a complete runtime environment, including the OS and system libraries, separate from the host's kernel. This unique approach results in higher isolation and efficiency. Docker, being more comprehensive than virtual environments yet lighter than full virtual machines, is ideal for ensuring consistent and isolated development across different systems. This guide aims to simplify your development process and offer a clear pathway through the complexities of AI programming, whether you're a newcomer or a seasoned developer.