-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add examples tools #1
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some general comments on this and didn't get into the nitty gritty details, but I think we should keep this PR open for a while to brainstorm and keep meetings to find a proper configuration file that is first easier to understand for users and second is structured based on best practices. Overall, I loved the idea; thanks
|
||
## Define your tool via a `tool.toml` file | ||
|
||
The `tool.toml` file simplifies the process of: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest removing the .toml
extension to make it easier to understand and straightforward. We might find benefits in other config types like as 'yaml' or 'conf' in the future, as if we are currently thinking in drs_config.toml
to switch yaml
, we don't have to change the names of all tools configurations or worse, ask them to change in their tool. Also, I suggest modifying the name to something like .freva
rather than tool
. Yes, it's true that it is a config file of tool, but because, to be honest, the user usually says that freva is running my program and that I need to setup freva for it. so I think the configuration fits more if we call it .freva
. Then on the home directory, we have ~/.freva/...
or ~/.frevatool
or something relevant to freva instead of ~/tool/...
which might be confusing for the user what tool is in my home dir, since the name is not telling anything about freva.
|
||
```toml | ||
[tool.build] | ||
dependencies = ["rust"] # Build-specific dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we may define rust
or any other tool prerequisite in the dependents of tool.run
, and since every tool has one env file for a tool.toml
under ~/tool/..
dir, do you think we need these dependencies with this structure under build sub table?
I'd say let's do it another way, like Conda is doing. I meant dedicating greater resources to getting external items.
[tool.build]
url = "https://www.example.com/executor.tgz"
#or
local = "/somewhere/on/levante/executor.tgz"
And for more urls or localities, I see the value of using yaml instead of toml again because toml treats everything as a dictionary. But I'm not sure about that part, and I think it needs further brainstorming.
[tool.input_parameters.parameter_db] | ||
title = "Search Parameter" | ||
type = "databrowser" | ||
search_key = "variable" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts about extending this search_key
to search_attributes
or whatever is relevant and defining it like this? The user will then have more control over a single parameter in the databrowser.
Also, because this is a freva generic type, what do you think about changing the type name to freva
rather than solr
or databrowser
? and also, because all of the other introduced parameters types are familiar to scientists who write script.
[tool.input_parameters.parameter_db]
title = "advanced custom"
type = "freva"
search_attrs = [
{ "variable" : "something" },
{ "variable" : "something_else" },
{ "variable_not" : "avoided var"},
{"experiment_not": "historical"},
]
I defined different dicts since we have variables twice and each key in a dictionary must be unique. Again another disadvantage of using toml!
Then we don't have to decide on any pre_defined that are unnecessary. We can reference user or admin or whoever setup tool.toml
to look at the docs.
""" | ||
``` | ||
|
||
##### Range Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do u think to move the range to the previous section?
Or if you want to keep it here, at least let's make it more advance:
[tool.input_parameters.parameter_range]
title = "Range Example"
type = "range"
value_type = "datetime" # float or int
default = ["2024-01-01T00:00:00", "2024-12-31T23:59:59", "1d"]
or we can simply take care of value_type in the backend
git push origin add-your-tool-name | ||
``` | ||
1. Navigate to the original repository and open a pull request. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of adding some lines regarding create_environment script?
python create_environment.py --help
usage: create-conda-env [-h] [-d] [-f] [-p PREFIX] [-v] input_dir
positional arguments:
input_dir The path to the tool definition.
options:
-h, --help show this help message and exit
-d, --dev Use development mode for any installation.
-f, --force Force recreation of the environment.
-p, --prefix PREFIX The install prefix where the environment should be installed
-v, --verbose
create_environment.py
Outdated
/ version.lower().strip("v") | ||
) | ||
if force is True or check_for_environment_creation( | ||
input_dir, config["tool"]["run"]["dependencies"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we consider any situation without dependency?
if yes an we need to take this scenario into account we have to define it like this
if force is True or check_for_environment_creation(
input_dir, config["tool"]["run"].get("dependencies", [])
):
to not get
❯ python create_environment.py examples/rust-build
2024-11-24 08:48:10,220 - DEBUG - Checking arch and getting mamba url
Traceback (most recent call last):
File "/Users/mo/dev/20241124/data-analysis-tools/create_environment.py", line 567, in <module>
main(app.input_dir, app.prefix, force=app.force)
~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/mo/dev/20241124/data-analysis-tools/create_environment.py", line 531, in main
input_dir, config["tool"]["run"]["dependencies"]
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: 'dependencies'
Otherwise we need to handle the error in logs
create_environment.py
Outdated
if force is True or check_for_environment_creation( | ||
input_dir, config["tool"]["run"]["dependencies"] | ||
): | ||
with TemporaryDirectory() as temp_dir: | ||
tar_path = Path(temp_dir) / "micromamba.tar.bz2" | ||
download_with_progress(mamba_url, tar_path) | ||
extract_micromamba(tar_path, temp_dir) | ||
micromamba_path = Path(temp_dir) / "bin" / "micromamba" | ||
if not micromamba_path.is_file: | ||
raise ValueError( | ||
"Micromamba binary was not found after extraction." | ||
) | ||
create_environment(Path(temp_dir), input_dir, env_dir, config) | ||
set_version(env_dir, version, new=True) | ||
else: | ||
set_version(env_dir, version, new=False) | ||
share_dir = env_dir / "share" / "tool" / config["tool"]["name"] | ||
share_dir.mkdir(exist_ok=True, parents=True) | ||
build_env_file = input_dir / "build-environment.lock" | ||
copy_all(input_dir, share_dir) | ||
try: | ||
build( | ||
env_dir.parent, | ||
share_dir, | ||
build_env_file, | ||
config["tool"].get("build", {}), | ||
) | ||
except Exception as error: | ||
logger.error(error) | ||
shutil.rmtree(env_dir) | ||
raise ValueError("Failed to create environment.") | ||
print("The tool was successfully deployed in:", share_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do u think of taking downloading micromamba out of the if loop, since we need it for build stage as well?
soemthing like this:
mamba_temp_dir = TemporaryDirectory()
try:
tar_path = Path(mamba_temp_dir.name) / "micromamba.tar.bz2"
download_with_progress(mamba_url, tar_path)
extract_micromamba(tar_path, mamba_temp_dir.name)
micromamba_path = Path(mamba_temp_dir.name) / "bin" / "micromamba"
if not micromamba_path.is_file():
raise ValueError(
"Micromamba binary was not found after extraction."
)
if force is True or check_for_environment_creation(
input_dir, config["tool"]["run"].get("dependencies", [])
):
create_environment(Path(mamba_temp_dir.name), input_dir, env_dir, config)
set_version(env_dir, version, new=True)
else:
set_version(env_dir, version, new=False)
share_dir = env_dir / "share" / "tool" / config["tool"]["name"]
share_dir.mkdir(exist_ok=True, parents=True)
build_env_file = input_dir / "build-environment.lock"
copy_all(input_dir, share_dir)
try:
build(
env_dir.parent,
share_dir,
build_env_file,
config["tool"].get("build", {}),
micromamba_path=micromamba_path
)
except Exception as error:
logger.error(error)
shutil.rmtree(env_dir)
raise ValueError("Failed to create environment.")
print("The tool was successfully deployed in:", share_dir)
finally:
mamba_temp_dir.cleanup()
And then in the build
function we need to change the mamba
with str(micromamba_path)
while we already changed the build(env_dir, build_dir, env_file, build_config):
to build(env_dir, build_dir, env_file, build_config, micromamba_path):
This is a proposal of how we can define tools in a toml file (mainly based on @felio92 ) work. I added a simple script that tries to create versioned anaconda environemtns to make those environments as reproducible as possible.
To show the concept I've added a couple of examples.