Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Refactoring #78

Open
wants to merge 2 commits into
base: 0.4
Choose a base branch
from
Open

Docker Refactoring #78

wants to merge 2 commits into from

Conversation

MichaelRipa
Copy link
Member

Description

This PR introduces significant improvements to the Docker setup and the build process, enhancing both efficiency and reliability. The key changes include:

  • Dockerfile Reorganization:

    • Before: Only base.dockerfile and service.dockerfile existed within the ./services directory. The Makefile navigated to services/[name]/ to execute builds using these Dockerfiles.
    • After: Moved all Dockerfiles to a centralized ./docker directory and introduced a third Dockerfile, dockerfile.conda. The three Dockerfiles are:
      • dockerfile.base
      • dockerfile.conda
      • dockerfile.service
  • Optimized Conda Environment Setup:

    • Before: Each service individually installed its Conda environment from services/[name]/environment.yml in the base.dockerfile, leading to redundant installations and increased build times.
    • After:
      • dockerfile.base now creates a shared base environment with common dependencies.
      • dockerfile.conda updates this base environment with service-specific dependencies. This reuse of the base image significantly reduces build times by avoiding repeated installations of common packages.
  • Makefile Enhancements:

    • Bug Fix: Resolved an issue where N_DEVICES directly utilized nvidia-smi, causing crashes on machines without Nvidia GPUs. The updated Makefile now safely checks for the presence of nvidia-smi before attempting to use it.
    • Build Process: Adapted the Makefile to work with the new Dockerfile structure, ensuring seamless integration and improved build efficiency.

Benchmarks

I conducted a performance comparison between the old and the refactored build setups on the dev deployment:

  • Old Setup: 347 seconds
  • Refactored Setup: 225 seconds

NOTE: This benchmark excludes the ray_worker service from the build process. Including it would likely result in even more substantial reductions in build duration.

@MichaelRipa MichaelRipa linked an issue Jan 9, 2025 that may be closed by this pull request
Copy link

@JohnGouwar JohnGouwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good, just two small changes in your Makefile dependencies.


docker build --no-cache -t $(NAME)_base:latest -f ../base.dockerfile .
build_conda:
Copy link

@JohnGouwar JohnGouwar Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_conda should have a dependency on build_base


docker build --no-cache -t $(NAME)_base:latest -f ../base.dockerfile .
build_conda:
docker build --no-cache --build-arg NAME=$(NAME) -t $(NAME)_conda:latest -f docker/dockerfile.conda .

build_service:
Copy link

@JohnGouwar JohnGouwar Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_service should have a dependency on build_conda

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Makefile doesn't support machines without GPU's
2 participants