Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add examples tools #1

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Add examples tools #1

wants to merge 17 commits into from

Conversation

antarcticrainforest
Copy link
Member

This is a proposal of how we can define tools in a toml file (mainly based on @felio92 ) work. I added a simple script that tries to create versioned anaconda environemtns to make those environments as reproducible as possible.

To show the concept I've added a couple of examples.

Copy link

@mo-dkrz mo-dkrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some general comments on this and didn't get into the nitty gritty details, but I think we should keep this PR open for a while to brainstorm and keep meetings to find a proper configuration file that is first easier to understand for users and second is structured based on best practices. Overall, I loved the idea; thanks


## Define your tool via a `tool.toml` file

The `tool.toml` file simplifies the process of:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest removing the .toml extension to make it easier to understand and straightforward. We might find benefits in other config types like as 'yaml' or 'conf' in the future, as if we are currently thinking in drs_config.toml to switch yaml , we don't have to change the names of all tools configurations or worse, ask them to change in their tool. Also, I suggest modifying the name to something like .freva rather than tool. Yes, it's true that it is a config file of tool, but because, to be honest, the user usually says that freva is running my program and that I need to setup freva for it. so I think the configuration fits more if we call it .freva. Then on the home directory, we have ~/.freva/... or ~/.frevatool or something relevant to freva instead of ~/tool/... which might be confusing for the user what tool is in my home dir, since the name is not telling anything about freva.


```toml
[tool.build]
dependencies = ["rust"] # Build-specific dependencies
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we may define rust or any other tool prerequisite in the dependents of tool.run, and since every tool has one env file for a tool.toml under ~/tool/.. dir, do you think we need these dependencies with this structure under build sub table?
I'd say let's do it another way, like Conda is doing. I meant dedicating greater resources to getting external items.

[tool.build]
url = "https://www.example.com/executor.tgz"
#or
local = "/somewhere/on/levante/executor.tgz"

And for more urls or localities, I see the value of using yaml instead of toml again because toml treats everything as a dictionary. But I'm not sure about that part, and I think it needs further brainstorming.

[tool.input_parameters.parameter_db]
title = "Search Parameter"
type = "databrowser"
search_key = "variable"
Copy link

@mo-dkrz mo-dkrz Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts about extending this search_key to search_attributes or whatever is relevant and defining it like this? The user will then have more control over a single parameter in the databrowser.
Also, because this is a freva generic type, what do you think about changing the type name to freva rather than solr or databrowser? and also, because all of the other introduced parameters types are familiar to scientists who write script.

[tool.input_parameters.parameter_db]
title = "advanced custom"
type = "freva"
search_attrs = [
    { "variable" : "something" },
    { "variable" : "something_else" },
    { "variable_not" : "avoided var"},
    {"experiment_not":  "historical"},
]

I defined different dicts since we have variables twice and each key in a dictionary must be unique. Again another disadvantage of using toml! 
Then we don't have to decide on any pre_defined that are unnecessary. We can reference user or admin or whoever setup tool.toml to look at the docs.

"""
```

##### Range Parameters
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do u think to move the range to the previous section?
Or if you want to keep it here, at least let's make it more advance:

[tool.input_parameters.parameter_range]
title = "Range Example"
type = "range"
value_type = "datetime" # float or int
default = ["2024-01-01T00:00:00", "2024-12-31T23:59:59", "1d"]

or we can simply take care of value_type in the backend

git push origin add-your-tool-name
```
1. Navigate to the original repository and open a pull request.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of adding some lines regarding create_environment script?

python create_environment.py --help
usage: create-conda-env [-h] [-d] [-f] [-p PREFIX] [-v] input_dir

positional arguments:
  input_dir            The path to the tool definition.

options:
  -h, --help           show this help message and exit
  -d, --dev            Use development mode for any installation.
  -f, --force          Force recreation of the environment.
  -p, --prefix PREFIX  The install prefix where the environment should be installed
  -v, --verbose

/ version.lower().strip("v")
)
if force is True or check_for_environment_creation(
input_dir, config["tool"]["run"]["dependencies"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we consider any situation without dependency?
if yes an we need to take this scenario into account we have to define it like this

    if force is True or check_for_environment_creation(
        input_dir, config["tool"]["run"].get("dependencies", [])
    ):

to not get

❯ python create_environment.py examples/rust-build
2024-11-24 08:48:10,220 - DEBUG - Checking arch and getting mamba url
Traceback (most recent call last):
  File "/Users/mo/dev/20241124/data-analysis-tools/create_environment.py", line 567, in <module>
    main(app.input_dir, app.prefix, force=app.force)
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mo/dev/20241124/data-analysis-tools/create_environment.py", line 531, in main
    input_dir, config["tool"]["run"]["dependencies"]
               ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: 'dependencies'

Otherwise we need to handle the error in logs

Comment on lines 530 to 561
if force is True or check_for_environment_creation(
input_dir, config["tool"]["run"]["dependencies"]
):
with TemporaryDirectory() as temp_dir:
tar_path = Path(temp_dir) / "micromamba.tar.bz2"
download_with_progress(mamba_url, tar_path)
extract_micromamba(tar_path, temp_dir)
micromamba_path = Path(temp_dir) / "bin" / "micromamba"
if not micromamba_path.is_file:
raise ValueError(
"Micromamba binary was not found after extraction."
)
create_environment(Path(temp_dir), input_dir, env_dir, config)
set_version(env_dir, version, new=True)
else:
set_version(env_dir, version, new=False)
share_dir = env_dir / "share" / "tool" / config["tool"]["name"]
share_dir.mkdir(exist_ok=True, parents=True)
build_env_file = input_dir / "build-environment.lock"
copy_all(input_dir, share_dir)
try:
build(
env_dir.parent,
share_dir,
build_env_file,
config["tool"].get("build", {}),
)
except Exception as error:
logger.error(error)
shutil.rmtree(env_dir)
raise ValueError("Failed to create environment.")
print("The tool was successfully deployed in:", share_dir)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do u think of taking downloading micromamba out of the if loop, since we need it for build stage as well?
soemthing like this:

    mamba_temp_dir = TemporaryDirectory()
    try:
        tar_path = Path(mamba_temp_dir.name) / "micromamba.tar.bz2"
        download_with_progress(mamba_url, tar_path)
        extract_micromamba(tar_path, mamba_temp_dir.name)
        micromamba_path = Path(mamba_temp_dir.name) / "bin" / "micromamba"
        
        if not micromamba_path.is_file():
            raise ValueError(
                "Micromamba binary was not found after extraction."
            )

        if force is True or check_for_environment_creation(
            input_dir, config["tool"]["run"].get("dependencies", [])
        ):
            create_environment(Path(mamba_temp_dir.name), input_dir, env_dir, config)
            set_version(env_dir, version, new=True)
        else:
            set_version(env_dir, version, new=False)

        share_dir = env_dir / "share" / "tool" / config["tool"]["name"]
        share_dir.mkdir(exist_ok=True, parents=True)
        build_env_file = input_dir / "build-environment.lock"
        copy_all(input_dir, share_dir)

        try:
            build(
                env_dir.parent,
                share_dir,
                build_env_file,
                config["tool"].get("build", {}),
                micromamba_path=micromamba_path
            )
        except Exception as error:
            logger.error(error)
            shutil.rmtree(env_dir)
            raise ValueError("Failed to create environment.")
        print("The tool was successfully deployed in:", share_dir)
    finally:
        mamba_temp_dir.cleanup()

And then in the build function we need to change the mamba with str(micromamba_path) while we already changed the build(env_dir, build_dir, env_file, build_config): to build(env_dir, build_dir, env_file, build_config, micromamba_path):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants