Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assess the viability of adopting Docker Containers vs Conda as the standard execution environment for the package #534

Closed
gcroci2 opened this issue Dec 14, 2023 · 5 comments · Fixed by #549
Assignees

Comments

@gcroci2
Copy link
Collaborator

gcroci2 commented Dec 14, 2023

In the course of PR #528, we recognized the benefits of constructing an environment within a Docker container, which eliminates the need for users to manually handle the installation of various dependencies required by deeprank2. Thus, there is the possibility to make Docker containers the default execution environment for the package. However, there are some concerns about this direction.

PROs

  • Dependency management: Docker simplifies dependency management. All the required dependencies are specified in the Dockerfile, making it easier for users to set up the environment without manually installing each component.
  • Ease of reproducibility and consistency across environments: Docker images are essentially snapshots of an environment, making it easy to reproduce the same environment on different machines or at different points in time. Additionally, Docker allows for consistent deployment across different environments. It helps ensure that the package works the same way regardless of the underlying system, which is particularly useful in production scenarios.

CONs:

  • Learning curve: there may be a learning curve for users unfamiliar with Docker, adding complexity to the process of setting up and running the package.
  • Limited resource access: Docker containers are inherently isolated, and they don't have direct access to all resources of the host machine. This can be partially modified within the Docker settings, but it needs to be done properly by the user and might be a drawback in scenarios where an application requires low-level access to specific hardware or system resources. For my MAC for example, not all the CPUs can be used (maximum 8 out of 10). Additionally, the official Docker docs advise users to limit the containers' resources for both system instability and security risks.
  • GPU access: Docker traditionally has limitations when it comes to accessing GPUs directly. This can be a significant drawback for applications that heavily rely on GPU processing power, such as our machine learning pipeline.
  • Supercomputers: Realistically, users will run deeprank2 mainly run on supercomputers, where they won't have sudo permissions. In such cases, running Docker will require certain actions and considerations from the user side (e.g., system administrator assistance, compatibility with the job scheduler, containerized job scripts, availability of the Docker image containing your application on the supercomputer's file system or a container registry accessible to the compute nodes).

In my opinion, for our community use cases, Docker is not the best choice, especially for the latter couple of cons listed above. But please let me know your thoughts :) and also if you know about alternatives that overcome Docker's limitations.

@LourensVeen
Copy link

Another downside is that Docker containers aren't composable, i.e. if you want to use program A together with program B and they're both distributed as Docker images, then it's going to be tricky to make it work. At best you could mount a folder in both and exchange data between them, but a single Python script that calls functions from both won't work.

I think a better solution here is to create packages on conda-forge for deeprank2 and, if necessary, its dependencies. It seems that DSSP is the main dependency not available as a Conda package. Possibly the authors could be persuaded to provide one, or maybe we could set it up and they could maintain it, or we could do it all ourselves, in descending order of preference.

@gcroci2 gcroci2 added the priority Solve this first label Dec 19, 2023
@gcroci2 gcroci2 changed the title Assessing the viability of adopting Docker Containers as the standard execution environment for the package Assess the viability of adopting Docker Containers as the standard execution environment for the package Dec 19, 2023
@gcroci2
Copy link
Collaborator Author

gcroci2 commented Jan 4, 2024

I opened an issue in the DSSP repo requesting Conda package support for DSSP. We'll wait for the maintainers to check it out before figuring out what to do next.

@gcroci2
Copy link
Collaborator Author

gcroci2 commented Jan 4, 2024

The authors of DSSP are not willing to release it on Conda, but they're fine with us doing that.

Do you think it is feasible in a short time even if we're not the authors of the package? @LourensVeen If not, we will evaluate whether to entirely remove the dependency on DSSP. It is used to generate one relatively minor feature and is not fundamental to our pipeline. If instead is doable, then we can have a brief chat for tips (I've never released anything on conda) and then I'll start the procedure myself.

@LourensVeen
Copy link

Okay, I've got Conda swapped out a bit at the moment, but I'm back at work next week and I'll get back on it, so we should be able to set something up then.

@gcroci2
Copy link
Collaborator Author

gcroci2 commented Jan 15, 2024

I installed this conda version of dssp on MacOS with M1 chip in a new environment and the tests passed. They also pass by setting up the env on Snellius, being careful about providing conda-forge channel but not anaconda. Same for the CI (see PR #549). We will keep working for migrating everything to conda in #559

@DaniBodor DaniBodor removed their assignment Jan 16, 2024
@gcroci2 gcroci2 changed the title Assess the viability of adopting Docker Containers as the standard execution environment for the package Assess the viability of adopting Docker Containers vs Conda the standard execution environment for the package Jan 18, 2024
@gcroci2 gcroci2 changed the title Assess the viability of adopting Docker Containers vs Conda the standard execution environment for the package Assess the viability of adopting Docker Containers vs Conda as the standard execution environment for the package Jan 18, 2024
@gcroci2 gcroci2 linked a pull request Jan 19, 2024 that will close this issue
@gcroci2 gcroci2 closed this as completed Jan 24, 2024
@gcroci2 gcroci2 removed the priority Solve this first label Mar 19, 2024
@gcroci2 gcroci2 moved this to Done in Development Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants