Add examples tools #1

antarcticrainforest · 2024-11-21T15:09:09Z

This is a proposal of how we can define tools in a toml file (mainly based on @felio92 ) work. I added a simple script that tries to create versioned anaconda environemtns to make those environments as reproducible as possible.

To show the concept I've added a couple of examples.

mo-dkrz

I made some general comments on this and didn't get into the nitty gritty details, but I think we should keep this PR open for a while to brainstorm and keep meetings to find a proper configuration file that is first easier to understand for users and second is structured based on best practices. Overall, I loved the idea; thanks

mo-dkrz · 2024-11-24T06:33:16Z

README.md

+
+## Define your tool via a `tool.toml` file
+
+The `tool.toml` file simplifies the process of:


I would suggest removing the .toml extension to make it easier to understand and straightforward. We might find benefits in other config types like as 'yaml' or 'conf' in the future, as if we are currently thinking in drs_config.toml to switch yaml , we don't have to change the names of all tools configurations or worse, ask them to change in their tool. Also, I suggest modifying the name to something like .freva rather than tool. Yes, it's true that it is a config file of tool, but because, to be honest, the user usually says that freva is running my program and that I need to setup freva for it. so I think the configuration fits more if we call it .freva. Then on the home directory, we have ~/.freva/... or ~/.frevatool or something relevant to freva instead of ~/tool/... which might be confusing for the user what tool is in my home dir, since the name is not telling anything about freva.

mo-dkrz · 2024-11-24T06:40:36Z

README.md

+
+```toml
+[tool.build]
+dependencies = ["rust"]  # Build-specific dependencies


Since we may define rust or any other tool prerequisite in the dependents of tool.run, and since every tool has one env file for a tool.toml under ~/tool/.. dir, do you think we need these dependencies with this structure under build sub table?
I'd say let's do it another way, like Conda is doing. I meant dedicating greater resources to getting external items.

[tool.build] url = "https://www.example.com/executor.tgz" #or local = "/somewhere/on/levante/executor.tgz"

And for more urls or localities, I see the value of using yaml instead of toml again because toml treats everything as a dictionary. But I'm not sure about that part, and I think it needs further brainstorming.

mo-dkrz · 2024-11-24T07:02:03Z

README.md

+[tool.input_parameters.parameter_db]
+title = "Search Parameter"
+type = "databrowser"
+search_key = "variable"


What are your thoughts about extending this search_key to search_attributes or whatever is relevant and defining it like this? The user will then have more control over a single parameter in the databrowser.
Also, because this is a freva generic type, what do you think about changing the type name to freva rather than solr or databrowser? and also, because all of the other introduced parameters types are familiar to scientists who write script.

[tool.input_parameters.parameter_db] title = "advanced custom" type = "freva" search_attrs = [ { "variable" : "something" }, { "variable" : "something_else" }, { "variable_not" : "avoided var"}, {"experiment_not": "historical"}, ]

I defined different dicts since we have variables twice and each key in a dictionary must be unique. Again another disadvantage of using toml!
Then we don't have to decide on any pre_defined that are unnecessary. We can reference user or admin or whoever setup tool.toml to look at the docs.

mo-dkrz · 2024-11-24T07:13:07Z

README.md

+"""
+```
+
+##### Range Parameters


what do u think to move the range to the previous section?
Or if you want to keep it here, at least let's make it more advance:

[tool.input_parameters.parameter_range] title = "Range Example" type = "range" value_type = "datetime" # float or int default = ["2024-01-01T00:00:00", "2024-12-31T23:59:59", "1d"]

or we can simply take care of value_type in the backend

mo-dkrz · 2024-11-24T07:42:52Z

README.md

+    git push origin add-your-tool-name
+    ```
+1. Navigate to the original repository and open a pull request.
+


What do you think of adding some lines regarding create_environment script?

python create_environment.py --help usage: create-conda-env [-h] [-d] [-f] [-p PREFIX] [-v] input_dir positional arguments: input_dir The path to the tool definition. options: -h, --help show this help message and exit -d, --dev Use development mode for any installation. -f, --force Force recreation of the environment. -p, --prefix PREFIX The install prefix where the environment should be installed -v, --verbose

mo-dkrz · 2024-11-24T07:53:55Z

create_environment.py

+        / version.lower().strip("v")
+    )
+    if force is True or check_for_environment_creation(
+        input_dir, config["tool"]["run"]["dependencies"]


Could we consider any situation without dependency?
if yes an we need to take this scenario into account we have to define it like this

if force is True or check_for_environment_creation( input_dir, config["tool"]["run"].get("dependencies", []) ):

to not get

❯ python create_environment.py examples/rust-build 2024-11-24 08:48:10,220 - DEBUG - Checking arch and getting mamba url Traceback (most recent call last): File "/Users/mo/dev/20241124/data-analysis-tools/create_environment.py", line 567, in <module> main(app.input_dir, app.prefix, force=app.force) ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/mo/dev/20241124/data-analysis-tools/create_environment.py", line 531, in main input_dir, config["tool"]["run"]["dependencies"] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^ KeyError: 'dependencies'

Otherwise we need to handle the error in logs

mo-dkrz · 2024-11-24T08:06:14Z

create_environment.py

+    if force is True or check_for_environment_creation(
+        input_dir, config["tool"]["run"]["dependencies"]
+    ):
+        with TemporaryDirectory() as temp_dir:
+            tar_path = Path(temp_dir) / "micromamba.tar.bz2"
+            download_with_progress(mamba_url, tar_path)
+            extract_micromamba(tar_path, temp_dir)
+            micromamba_path = Path(temp_dir) / "bin" / "micromamba"
+            if not micromamba_path.is_file:
+                raise ValueError(
+                    "Micromamba binary was not found after extraction."
+                )
+            create_environment(Path(temp_dir), input_dir, env_dir, config)
+            set_version(env_dir, version, new=True)
+    else:
+        set_version(env_dir, version, new=False)
+    share_dir = env_dir / "share" / "tool" / config["tool"]["name"]
+    share_dir.mkdir(exist_ok=True, parents=True)
+    build_env_file = input_dir / "build-environment.lock"
+    copy_all(input_dir, share_dir)
+    try:
+        build(
+            env_dir.parent,
+            share_dir,
+            build_env_file,
+            config["tool"].get("build", {}),
+        )
+    except Exception as error:
+        logger.error(error)
+        shutil.rmtree(env_dir)
+        raise ValueError("Failed to create environment.")
+    print("The tool was successfully deployed in:", share_dir)


what do u think of taking downloading micromamba out of the if loop, since we need it for build stage as well?
soemthing like this:

mamba_temp_dir = TemporaryDirectory() try: tar_path = Path(mamba_temp_dir.name) / "micromamba.tar.bz2" download_with_progress(mamba_url, tar_path) extract_micromamba(tar_path, mamba_temp_dir.name) micromamba_path = Path(mamba_temp_dir.name) / "bin" / "micromamba" if not micromamba_path.is_file(): raise ValueError( "Micromamba binary was not found after extraction." ) if force is True or check_for_environment_creation( input_dir, config["tool"]["run"].get("dependencies", []) ): create_environment(Path(mamba_temp_dir.name), input_dir, env_dir, config) set_version(env_dir, version, new=True) else: set_version(env_dir, version, new=False) share_dir = env_dir / "share" / "tool" / config["tool"]["name"] share_dir.mkdir(exist_ok=True, parents=True) build_env_file = input_dir / "build-environment.lock" copy_all(input_dir, share_dir) try: build( env_dir.parent, share_dir, build_env_file, config["tool"].get("build", {}), micromamba_path=micromamba_path ) except Exception as error: logger.error(error) shutil.rmtree(env_dir) raise ValueError("Failed to create environment.") print("The tool was successfully deployed in:", share_dir) finally: mamba_temp_dir.cleanup()

And then in the build function we need to change the mamba with str(micromamba_path) while we already changed the build(env_dir, build_dir, env_file, build_config): to build(env_dir, build_dir, env_file, build_config, micromamba_path):

antarcticrainforest added 8 commits November 20, 2024 15:43

Add examples.

63c144b

Update readme.

15a22e5

Update readme.

ede577f

Update README & add code-of-conduct.

98ac328

Update README.

0943a64

Move environment.yml to environment.lock

7fc8952

Rename environment.yml -> environment.lock

5202d9e

Update README.

cdd82f0

antarcticrainforest requested review from bijanf, felio92 and mo-dkrz November 21, 2024 15:09

antarcticrainforest self-assigned this Nov 21, 2024

Bug fixing.

74250f3

mo-dkrz requested changes Nov 24, 2024

View reviewed changes

antarcticrainforest added 8 commits November 25, 2024 00:40

Check the existence of more files.

04567e5

Mamba needs yml file extensions.

2affdcb

Mamba doesn't like the name environment-lock.yml.

a2be85f

Mamba doesn't like the name environment-lock.yml.

1f21e6c

Check if created latest conda version still exists.

19d8af7

Feature: users can pass path to tool.toml file.

73a1c87

Add tool definitions that need to be fixed -> demo purpose

9014f8b

Catch toml decodeing error, check for valid versoins.

5e6a0c7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add examples tools #1

Add examples tools #1

antarcticrainforest commented Nov 21, 2024

mo-dkrz left a comment

mo-dkrz Nov 24, 2024

mo-dkrz Nov 24, 2024

mo-dkrz Nov 24, 2024 •

edited

Loading

mo-dkrz Nov 24, 2024

mo-dkrz Nov 24, 2024

mo-dkrz Nov 24, 2024

mo-dkrz Nov 24, 2024


		## Define your tool via a `tool.toml` file

		The `tool.toml` file simplifies the process of:

Add examples tools #1

Are you sure you want to change the base?

Add examples tools #1

Conversation

antarcticrainforest commented Nov 21, 2024

mo-dkrz left a comment

Choose a reason for hiding this comment

mo-dkrz Nov 24, 2024

Choose a reason for hiding this comment

mo-dkrz Nov 24, 2024

Choose a reason for hiding this comment

mo-dkrz Nov 24, 2024 • edited Loading

Choose a reason for hiding this comment

mo-dkrz Nov 24, 2024

Choose a reason for hiding this comment

mo-dkrz Nov 24, 2024

Choose a reason for hiding this comment

mo-dkrz Nov 24, 2024

Choose a reason for hiding this comment

mo-dkrz Nov 24, 2024

Choose a reason for hiding this comment

mo-dkrz Nov 24, 2024 •

edited

Loading