-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add transfer calculation for density restart #706
base: main
Are you sure you want to change the base?
Conversation
@mbercx I will now start working in adding some tests, so let me know if there is something important I should be testing for so that I don't miss it. I may also add some doc section on how to use it after that. You can also give a first pass and check if the workchain follows the general guildelines, I tried to make it as similar as possible to existing ones (the only thing missing is the yaml for the protocol which I hadn't time to investigate, but since it shouldn't affect the interface I'm hoping it is not critical right now). |
3416f83
to
8fd52c2
Compare
@mbercx erm...any idea why all tests are failing like that? I didn't modify anything that should make old tests fail... |
@ramirezfranciscof its because of a broken release of |
Hey @sphuber, thanks for the reply! You mean this one? That should have been already released in v1.6.4 (at least it is there on the commit log). EDIT: uhm, I now see that this plugin has |
The problem is that the fix was only applied to |
8fd52c2
to
fe98617
Compare
Thanks @ramirezfranciscof . Before going into a detailed review, I first have some general questions about the purpose and design of the workchain. If I understand correctly, this work chain is a utility to run the
Then I have a comment on the design of the spec and the protocol. First of all, thanks for making an effort to reproduce the style of the other workchains, I appreciate it. The generic design principle is to make the input spec as "unrestricted" as possible. That is to say, unless you really need to, I think it is best to fully expose the inputs of sub processes. You already did this for the |
Hey @sphuber thanks for the comments! Yeah, in fact this would be a helper function to automate the movement of the restart information from/to external clusters (
Yeah, this is a good point, I don't know why I didn't exposed the |
Thanks for the clarification. I think the purpose of the workchain just needs to be cleared up a bit. If I understand correctly, the optional If the intention of the user is really to prepare the source data on the target computer such that it can be restarted from, do we really need the separate |
Yes, this is correct.
Well, this is the problem. Currently this is not possible for two separate reasons:
I can reorganize this so that there is a separate "long term solution" auxiliary function that does what you say and the rest of the workchain can be deprecated later, but so far doing the NSCF after the transfer is a necessity when copying to a remote for running.
Not exactly, the output already contains all files necessary if the next process starts with an NSCF. The main reasons it was optional were:
This made sense because I was thinking that the WorkChain would be the only tool dealing with the transfer. If you want me to do the auxiliary function route, then I would move al preparations there and then the workchain can retain only the function of bundling transferring + running the NSCF together (thus eliminating most of the "optional" stuff, in the sense that the NSCF would now not be optional since just running the assisted I actually would agree I like that organization better. |
81a91a1
to
4078d77
Compare
aa79f5c
to
2ad963b
Compare
2ad963b
to
229eca1
Compare
I added some tests for the utility function. I think that before I add any tests for the actual workchain and documentation for the process I would like you guys (@sphuber @mbercx) to take a first look in case any other significant modification is required. Moreover, I was just reminded by some test failures that the PS: I'm not super sure why the CI for python 3.8 is failing, it would seem some database contamination between tests maybe? I didn't touch any other tests of the suite and python 3.6 works, so it is strange... |
Thanks for the detailed additional explanation though @ramirezfranciscof . Now that the goal is a bit clearer, I still have to wonder why this needs to be in the main repository. Since it seems all of this is not fully hashed out yet and contains workarounds for issues that are being addressed elsewhere, and mostly this is to be used for a private project. What is the benefit of adding this to the public repository if so much is still unclear and might even get deprecated soon? Why not just keep this in a personal or the private QE repo? |
Well, I'm not sure how "soon" it would be deprecated. The discussion on the On the other hand I was also incentivized to clean up and publish some of the tools I'm using for my research project by "higher powers". I was under the impression that increasing the ease for re-starting calculations from stored densities was a feature of general interest we wanted to offer more publicly. I may have misinterpreted though and perhaps the private repo is enough for this (although I feel it is a bit more awkward to keep up to date with the public repo?), maybe @giovannipizzi can chip in with some clarifications. |
Hi all, thanks for the work @ramirezfranciscof and for the review @sphuber Indeed, we've been discussing and suggesting Francisco to release his code in the public repository for public use.
So as long as the code is 1) generic enough (e.g. restarting a previous QE calculation), 2) with a clear code and clear interface, 3) has tests, and 4) is documented, I would vote to merge it here. If there are specific questions I'm happy to discuss (maybe in person is faster if many more discussions arise on the scope of the code?) |
(This is a required addition for the LUMI pilot)
This is a workflow that runs a
TransferCalcjob
that copies the specific files necessary for restarting QE calculations from asource_data
input. The exact behavior depends on the type of this input and some extra parameters:RemoteData
, it will create aFolderData
output that will store the files locally (without requiring any other input).FolderData
, it also needs to be provided with acomputer
(remote_computer
) or acode
(nscf.pw.code
). Thecomputer
just lets the workflow know in which computer to put theRemoteData
in which to copy the information from thesource_data
, but if acode
is provided instead, it will also run an NSCF calculation to regenerate all wavefunctions and other files necessary to then run normal PW calculations (at least until we implement thestartingpot
option, seePwBaseWorkChain
: revisit restarting from calculations #691).So far compatibility works in the way that all inputs are accepted even if they are redundant, as long as they are coherent (i.e. if you provide a
RemoteData
and aremote_computer
, the data node should point to that same computer; idem if you provide aremote_computer
and anscf.pw.code
). I think the only exception is ifnscf
input is provided with aRemoteData
input, in which case thenscf
will be ignored (and a warning will be issued).