Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: automatic conda environment handling #1469

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

lkstrp
Copy link
Member

@lkstrp lkstrp commented Dec 20, 2024

Changes proposed in this Pull Request

  • Add option to let snakemake handle all conda environment handling based on pinned envs and os

  • Makes it unneccessary for a user to wrangle with dependencys and envrionments at all

  • Only snakemake would have to be installed. Then you could run snakemake ... --sdm conda and the

    • pinned env for your OS will be selected,
    • enabled if it already exists,
    • installed if it does not
    • and reinstalled if the file has changed.

A few things to discuss:

  • Do we still want the option to use the base environment.yaml with snakemake (was the setting before, not sure if anyone uses it). Also it is not robust as only pinned envs are checked by the CI.
    • This could be implemented via a config setting (e.g. run.env=pinned and run.env=base).
      • I would rather not add this setting, because if --sdm conda is not added, the setting will do nothing.
  • Which brings us to the option to add --sdm conda by default (or if set in config, e.g. deployment_method=pinned).
    • Would be a better config setting.
    • Not sure if this is possible with snakemake and good practice.
  • Theoretically, you would just need to have snakemake installed, either in an env or in the base env. And everything would run and update itself.
    • At the moment this is not possible because _helpers.py needs some dependencies and is loaded before snakemake handles the environments. But this module would benefit from being broken up anyway.

Checklist

  • I tested my contribution locally and it works as intended.
  • Code and workflow changes are sufficiently documented.
  • Changes in configuration options are added in config/config.default.yaml.
  • Changes in configuration options are documented in doc/configtables/*.csv.
  • A release note doc/release_notes.rst is added.

Copy link
Contributor

@FabianHofmann FabianHofmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a super nice and consistent approach! Perhaps we think about the case on how to deal with custom installations?

Copy link
Member

@fneum fneum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I also like the idea of using --sdm conda more!

The way this PR is currently set up, the conda_env_provider() can only be used with the pinned versions. However, while developing, it can also be quite useful to set

conda:
    "../envs/environment.yaml"

Because "software-env" is one of the --rerun-triggers (i.e. changes to environment.yaml will conveniently trigger reruns of the affected rules). Perhaps, there should at least be a toggle in the configuration to use pinned versions or environment.yaml directly as @lkstrp reluctantly suggested, e.g.

pinned_environments: true

Has someone looked into snakemake's recommendations for reproducible environments?

https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#freezing-environments-to-exactly-pinned-packages

@@ -81,7 +81,7 @@ rule create_scenarios:
output:
config["run"]["scenarios"]["file"],
conda:
"envs/retrieve.yaml"
conda_env_provider()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes sense to allow an argument here to a file "envs/retrieve.yaml" or similar/its pinned version?

I think there is some untapped potential to use rule-specific environments rather than one large environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants