Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dump computer/codes in YAML format #3521

Closed
adegomme opened this issue Nov 7, 2019 · 4 comments
Closed

dump computer/codes in YAML format #3521

adegomme opened this issue Nov 7, 2019 · 4 comments

Comments

@adegomme
Copy link
Contributor

adegomme commented Nov 7, 2019

Sometimes when using another environment, or setting up a new aiida for a new user, we have to setup the same computers/codes over and over again. For this, loading from YAML is quite practical.

But I can't find a way to generate these YAML files from an existing installation directly.
"verdi computer show" output is quite close to yaml and can be converted rather simply, but a "dump" command, or a --yaml switch for show would be useful (maybe I missed something and it's already possible, apologies in this case).

@ltalirz
Copy link
Member

ltalirz commented Nov 7, 2019

very good point - it does not exist and would be very useful (and easy to implement)

@adegomme
Copy link
Contributor Author

adegomme commented Nov 8, 2019

This would actually make it possible to provide a library of ready-to-use configurations for known HPC systems and codes, this could be interesting, as this part is actually quite tedious for end users.

@ltalirz
Copy link
Member

ltalirz commented Nov 8, 2019

This would actually make it possible to provide a library of ready-to-use configurations for known HPC systems and codes

This is already possible: simply set up the computers and codes and create an AiiDA export file.
This has the added advantage that computers / codes will be uniquely identified when used by multiple people.

verdi import works with URLs, so you can host your export file on a public URL and people will be able to get the configuration directly from the verdi cli.

@sphuber sphuber self-assigned this Nov 30, 2019
unkcpz added a commit to unkcpz/aiida-core that referenced this issue Jan 21, 2023
fixes aiidateam#3521

Add `verdi code export` command to export code from command line as a
ymal file.
This is mentioned in usability improvement as well as having a command to export the code and computer setup.
Keys of YAML file are read from the cli option of the corresponding code class.
GeigerJ2 added a commit to qiaojunfeng/aiida-core that referenced this issue Feb 28, 2024
I commited the changes from my local fork of the PR now here directly, as I think that will make working on it more
straightforward. Hope that's OK for everybody.

Based on @qiaojunfeng's original implementation, there's a recursive function that traverses the `WorkChain` hierarchy.
Now, as actual file I/O is only done for the `CalcJobNode`s involved in the workchain, I factored that out into its own
function `_calcjob_dump`. I also moved out these implementations into `tools/dumping/processes` so that it's available via the normal
Python API. In `cmd_workchain` and `cmd_calcjob` the commands `verdi workchain dump` and `verdi calcjob dump` are only
wrappers around the respective functions.

The `_calcjob_dump`, by default, dumps the `node.base.repository` and `node.outputs.retrieved` using `copy_tree`, as well as
`input_nodes` of the `calcJobNode` if they are of type `SingleFileData` or `FolderData`. I started working on the
`--use-prepare-for-submission` option, but as previously indicated by @sphuber (also see [here](https://aiida.discourse.group/t/obtain-a-calcjob-instance-from-a-corresponding-calcjobnode-or-builder/300)), it's not straightforward to make it work
as intended, so I put that on hold for now and added a warning.

For each `WorkChainNode` and `CalcJobNode` a selected set of the node `@property`s are dumped to yaml files. `extra`s
and `attribute`s can also be included via the `cli`. I initially had a function for this, but realized that I'm just
passing these arguments all the way through, so I encapsulated that in the the `ProcessNodeYamlDumper` class. Now, an
instance is created when the respective `cli` command is run, and the arguments are set as instance variables, with the
instance being passed through, rather than the original arguments. The `dump_yaml` method is then called at the relevant
positions with the `process_node` and `output_path` arguments. Regarding relevant positions: To get the node yaml
file for every `ProcessNode` involved, it's called in `cmd_workchain` and `cmd_calcjob` for the parent node, and
subsequently for all outgoing links. Maybe there's a better way to handle that?

THe other commands  initially mentioned by @qiaojunfeng also seem very interesting and could probably easily implemented
based on his original implementation, though we should agree on the overall API/relevant namespace first.

A few more notes:

- The `cli` options for `verdi workchain dump` and `verdi calcjob dump` are basically the same and the code is duplicated. We could
  avoid that by merging it into one, e.g. under `verdi node dump`, however, as a user, I would find `verdi workchain
  dump` and `verdi calcjob dump` more intuitive. Also, `verdi node repo dump` uses a former implementation of
  `copy_tree`, so I'd go ahead and update that, but in a separate PR (related, do we want to also implement (or change it to) just `verdi
  node dump`?)
- Regarding the `yaml` dumping, the `ProcessNodeYamlDumper` class is quite specific and just a little helper to get the
  job done here. Should we generalize such functionality, e.g. to resolve [issue
  aiidateam#3521](aiidateam#3521) to allow duming of computers/codes to yaml, or keep these
  things separate?
- Currently, the pseudo directories are called `pseudos__<X>`, and I thought about splitting on the double underscore to
  just have one `pseudos` directory with a subdirectory for each element. Personally, I'd find that nicer than having a
  bunch of `pseudos__<X>`, but I'm not sure if the double underscore is based on general AiiDA name mangling, or
  again specific to `aiida-quantumespresso`.

Lastly, examples of the default structure obtained from `verdi workchain dump`:

```shell
dump-462
├── 01-relax-PwRelaxWorkChain
│  ├── 01-PwBaseWorkChain
│  │  ├── 01-PwCalculation
│  │  │  ├── aiida_node_metadata.yaml
│  │  │  ├── node_inputs
│  │  │  │  └── pseudos__Si
│  │  │  │     └── Si.pbesol-n-rrkjus_psl.1.0.0.UPF
│  │  │  ├── raw_inputs
│  │  │  │  ├── .aiida
│  │  │  │  │  ├── calcinfo.json
│  │  │  │  │  └── job_tmpl.json
│  │  │  │  ├── _aiidasubmit.sh
│  │  │  │  └── aiida.in
│  │  │  └── raw_outputs
│  │  │     ├── _scheduler-stderr.txt
│  │  │     ├── _scheduler-stdout.txt
│  │  │     ├── aiida.out
│  │  │     └── data-file-schema.xml
│  │  └── aiida_node_metadata.yaml
│  ├── 02-PwBaseWorkChain
│  │  ├── 01-PwCalculation
│  │  │  ├── aiida_node_metadata.yaml
│  │  │  ├── node_inputs
│  │  │  │  └── pseudos__Si
│  │  │  │     └── Si.pbesol-n-rrkjus_psl.1.0.0.UPF
│  │  │  ├── raw_inputs
│  │  │  │  ├── .aiida
│  │  │  │  │  ├── calcinfo.json
│  │  │  │  │  └── job_tmpl.json
│  │  │  │  ├── _aiidasubmit.sh
│  │  │  │  └── aiida.in
│  │  │  └── raw_outputs
│  │  │     ├── _scheduler-stderr.txt
│  │  │     ├── _scheduler-stdout.txt
│  │  │     ├── aiida.out
│  │  │     └── data-file-schema.xml
│  │  └── aiida_node_metadata.yaml
│  └── aiida_node_metadata.yaml
├── 02-scf-PwBaseWorkChain
│  ├── 01-PwCalculation
│  │  ├── aiida_node_metadata.yaml
│  │  ├── node_inputs
│  │  │  └── pseudos__Si
│  │  │     └── Si.pbesol-n-rrkjus_psl.1.0.0.UPF
│  │  ├── raw_inputs
│  │  │  ├── .aiida
│  │  │  │  ├── calcinfo.json
│  │  │  │  └── job_tmpl.json
│  │  │  ├── _aiidasubmit.sh
│  │  │  └── aiida.in
│  │  └── raw_outputs
│  │     ├── _scheduler-stderr.txt
│  │     ├── _scheduler-stdout.txt
│  │     ├── aiida.out
│  │     └── data-file-schema.xml
│  └── aiida_node_metadata.yaml
├── 03-bands-PwBaseWorkChain
│  ├── 01-PwCalculation
│  │  ├── aiida_node_metadata.yaml
│  │  ├── node_inputs
│  │  │  └── pseudos__Si
│  │  │     └── Si.pbesol-n-rrkjus_psl.1.0.0.UPF
│  │  ├── raw_inputs
│  │  │  ├── .aiida
│  │  │  │  ├── calcinfo.json
│  │  │  │  └── job_tmpl.json
│  │  │  ├── _aiidasubmit.sh
│  │  │  └── aiida.in
│  │  └── raw_outputs
│  │     ├── _scheduler-stderr.txt
│  │     ├── _scheduler-stdout.txt
│  │     ├── aiida.out
│  │     └── data-file-schema.xml
│  └── aiida_node_metadata.yaml
└── aiida_node_metadata.yaml
```

and `verdi calcjob dump`:

```shell
dump-530
├── aiida_node_metadata.yaml
├── node_inputs
│  └── pseudos__Si
│     └── Si.pbesol-n-rrkjus_psl.1.0.0.UPF
├── raw_inputs
│  ├── .aiida
│  │  ├── calcinfo.json
│  │  └── job_tmpl.json
│  ├── _aiidasubmit.sh
│  └── aiida.in
└── raw_outputs
   ├── _scheduler-stderr.txt
   ├── _scheduler-stdout.txt
   ├── aiida.out
   └── data-file-schema.xml
```

for a `PwBandsWorkChain` and one of its involved `CalcJobNode`s.
@sphuber
Copy link
Contributor

sphuber commented Jun 1, 2024

This was addressed in #6389 and #5860

@sphuber sphuber closed this as completed Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants