-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dump computer/codes in YAML format #3521
Comments
very good point - it does not exist and would be very useful (and easy to implement) |
This would actually make it possible to provide a library of ready-to-use configurations for known HPC systems and codes, this could be interesting, as this part is actually quite tedious for end users. |
This is already possible: simply set up the computers and codes and create an AiiDA export file.
|
fixes aiidateam#3521 Add `verdi code export` command to export code from command line as a ymal file. This is mentioned in usability improvement as well as having a command to export the code and computer setup. Keys of YAML file are read from the cli option of the corresponding code class.
I commited the changes from my local fork of the PR now here directly, as I think that will make working on it more straightforward. Hope that's OK for everybody. Based on @qiaojunfeng's original implementation, there's a recursive function that traverses the `WorkChain` hierarchy. Now, as actual file I/O is only done for the `CalcJobNode`s involved in the workchain, I factored that out into its own function `_calcjob_dump`. I also moved out these implementations into `tools/dumping/processes` so that it's available via the normal Python API. In `cmd_workchain` and `cmd_calcjob` the commands `verdi workchain dump` and `verdi calcjob dump` are only wrappers around the respective functions. The `_calcjob_dump`, by default, dumps the `node.base.repository` and `node.outputs.retrieved` using `copy_tree`, as well as `input_nodes` of the `calcJobNode` if they are of type `SingleFileData` or `FolderData`. I started working on the `--use-prepare-for-submission` option, but as previously indicated by @sphuber (also see [here](https://aiida.discourse.group/t/obtain-a-calcjob-instance-from-a-corresponding-calcjobnode-or-builder/300)), it's not straightforward to make it work as intended, so I put that on hold for now and added a warning. For each `WorkChainNode` and `CalcJobNode` a selected set of the node `@property`s are dumped to yaml files. `extra`s and `attribute`s can also be included via the `cli`. I initially had a function for this, but realized that I'm just passing these arguments all the way through, so I encapsulated that in the the `ProcessNodeYamlDumper` class. Now, an instance is created when the respective `cli` command is run, and the arguments are set as instance variables, with the instance being passed through, rather than the original arguments. The `dump_yaml` method is then called at the relevant positions with the `process_node` and `output_path` arguments. Regarding relevant positions: To get the node yaml file for every `ProcessNode` involved, it's called in `cmd_workchain` and `cmd_calcjob` for the parent node, and subsequently for all outgoing links. Maybe there's a better way to handle that? THe other commands initially mentioned by @qiaojunfeng also seem very interesting and could probably easily implemented based on his original implementation, though we should agree on the overall API/relevant namespace first. A few more notes: - The `cli` options for `verdi workchain dump` and `verdi calcjob dump` are basically the same and the code is duplicated. We could avoid that by merging it into one, e.g. under `verdi node dump`, however, as a user, I would find `verdi workchain dump` and `verdi calcjob dump` more intuitive. Also, `verdi node repo dump` uses a former implementation of `copy_tree`, so I'd go ahead and update that, but in a separate PR (related, do we want to also implement (or change it to) just `verdi node dump`?) - Regarding the `yaml` dumping, the `ProcessNodeYamlDumper` class is quite specific and just a little helper to get the job done here. Should we generalize such functionality, e.g. to resolve [issue aiidateam#3521](aiidateam#3521) to allow duming of computers/codes to yaml, or keep these things separate? - Currently, the pseudo directories are called `pseudos__<X>`, and I thought about splitting on the double underscore to just have one `pseudos` directory with a subdirectory for each element. Personally, I'd find that nicer than having a bunch of `pseudos__<X>`, but I'm not sure if the double underscore is based on general AiiDA name mangling, or again specific to `aiida-quantumespresso`. Lastly, examples of the default structure obtained from `verdi workchain dump`: ```shell dump-462 ├── 01-relax-PwRelaxWorkChain │ ├── 01-PwBaseWorkChain │ │ ├── 01-PwCalculation │ │ │ ├── aiida_node_metadata.yaml │ │ │ ├── node_inputs │ │ │ │ └── pseudos__Si │ │ │ │ └── Si.pbesol-n-rrkjus_psl.1.0.0.UPF │ │ │ ├── raw_inputs │ │ │ │ ├── .aiida │ │ │ │ │ ├── calcinfo.json │ │ │ │ │ └── job_tmpl.json │ │ │ │ ├── _aiidasubmit.sh │ │ │ │ └── aiida.in │ │ │ └── raw_outputs │ │ │ ├── _scheduler-stderr.txt │ │ │ ├── _scheduler-stdout.txt │ │ │ ├── aiida.out │ │ │ └── data-file-schema.xml │ │ └── aiida_node_metadata.yaml │ ├── 02-PwBaseWorkChain │ │ ├── 01-PwCalculation │ │ │ ├── aiida_node_metadata.yaml │ │ │ ├── node_inputs │ │ │ │ └── pseudos__Si │ │ │ │ └── Si.pbesol-n-rrkjus_psl.1.0.0.UPF │ │ │ ├── raw_inputs │ │ │ │ ├── .aiida │ │ │ │ │ ├── calcinfo.json │ │ │ │ │ └── job_tmpl.json │ │ │ │ ├── _aiidasubmit.sh │ │ │ │ └── aiida.in │ │ │ └── raw_outputs │ │ │ ├── _scheduler-stderr.txt │ │ │ ├── _scheduler-stdout.txt │ │ │ ├── aiida.out │ │ │ └── data-file-schema.xml │ │ └── aiida_node_metadata.yaml │ └── aiida_node_metadata.yaml ├── 02-scf-PwBaseWorkChain │ ├── 01-PwCalculation │ │ ├── aiida_node_metadata.yaml │ │ ├── node_inputs │ │ │ └── pseudos__Si │ │ │ └── Si.pbesol-n-rrkjus_psl.1.0.0.UPF │ │ ├── raw_inputs │ │ │ ├── .aiida │ │ │ │ ├── calcinfo.json │ │ │ │ └── job_tmpl.json │ │ │ ├── _aiidasubmit.sh │ │ │ └── aiida.in │ │ └── raw_outputs │ │ ├── _scheduler-stderr.txt │ │ ├── _scheduler-stdout.txt │ │ ├── aiida.out │ │ └── data-file-schema.xml │ └── aiida_node_metadata.yaml ├── 03-bands-PwBaseWorkChain │ ├── 01-PwCalculation │ │ ├── aiida_node_metadata.yaml │ │ ├── node_inputs │ │ │ └── pseudos__Si │ │ │ └── Si.pbesol-n-rrkjus_psl.1.0.0.UPF │ │ ├── raw_inputs │ │ │ ├── .aiida │ │ │ │ ├── calcinfo.json │ │ │ │ └── job_tmpl.json │ │ │ ├── _aiidasubmit.sh │ │ │ └── aiida.in │ │ └── raw_outputs │ │ ├── _scheduler-stderr.txt │ │ ├── _scheduler-stdout.txt │ │ ├── aiida.out │ │ └── data-file-schema.xml │ └── aiida_node_metadata.yaml └── aiida_node_metadata.yaml ``` and `verdi calcjob dump`: ```shell dump-530 ├── aiida_node_metadata.yaml ├── node_inputs │ └── pseudos__Si │ └── Si.pbesol-n-rrkjus_psl.1.0.0.UPF ├── raw_inputs │ ├── .aiida │ │ ├── calcinfo.json │ │ └── job_tmpl.json │ ├── _aiidasubmit.sh │ └── aiida.in └── raw_outputs ├── _scheduler-stderr.txt ├── _scheduler-stdout.txt ├── aiida.out └── data-file-schema.xml ``` for a `PwBandsWorkChain` and one of its involved `CalcJobNode`s.
Sometimes when using another environment, or setting up a new aiida for a new user, we have to setup the same computers/codes over and over again. For this, loading from YAML is quite practical.
But I can't find a way to generate these YAML files from an existing installation directly.
"verdi computer show" output is quite close to yaml and can be converted rather simply, but a "dump" command, or a --yaml switch for show would be useful (maybe I missed something and it's already possible, apologies in this case).
The text was updated successfully, but these errors were encountered: