-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do we need yet another queuing system interface? #39
Comments
Hello @jan-janssen Thanks for opening this issue for discussion! I fully agree that it would be very beneficial to converge on a common interface for queuing systems. This is what we actually tried to start (alone, I must admit) with qtoolkit. As you mention, MP, Aiida and pyiron (and others) all have their own approaches to "interact" with the queueing systems but this was usually buried inside the codes (e.g. within fireworks code base, within aiida-core, ...). We tried to build qtoolkit as a "standalone" queue interface, with no (or at least no required) dependencies (this is, I think a prerequisite for such a low-level tool, to make sure no "cross-dependency-hell" occurs) and we would definitely be happy to discuss and try to converge towards a common solution (note that we developed qtoolkit for use in jobflow-remote, another package that deals with execution of jobflow workflows from the Materials Project stack). Pinging @giovannipizzi @utf @computron @gpetretto @gmrigna for thoughts/follow-up on this. |
Hi @jan-janssen, thanks for raising this issue. I definitely agree with your point of view. When we started this project we were rather tight on times, but we still spent some time checking the available packages. As David said, in most cases the actual functionalities are buried inside large packages with many dependencies. Also, most of them depend on some kind of "queue manager", that mostly fits the use case of the larger package. What I believe is the key point is having a package where the basic functionalities are directly and clearly exposed and not hidden in some queue manager.
In general, I expect that it will be much more difficult to have a "one size fits all" queue manager that could be used by everybody. Each workflow manager has different requirements depending on how it works, but sharing those common funcitonalities would be already a nice step forward. I am definitely willing to work on a shared solution for exposing the basic functionalities and reduce code duplication. |
Obviously I am a bit biased, as those are exactly the challenges I tried to address with
In A generic set of the variables is shared between all queuing systems: https://pysqa.readthedocs.io/en/latest/queue.html#queuing-systems
Still the users can also use
For the different queuing systems
For this
Yes, I completely agree. I am sure that some specific requirements are going to remain but if we can already agree on a shared library which covers part of the challenges, that is already a step in the right direction. |
Hey guys, I was pointed to this thread by @jan-janssen Just two small thoughts from my side:
Point 2 brings two important changes: (a) around authentication (moving from from time-unlimited SSH keys to time-limited tokens), and (b) restrictions around what users can do; in particular they won't allow the user to run arbitrary commands on the system (see e.g. https://firecrest-api.cscs.ch/ for an idea of what such an API can look like; or here is the slurm rest api https://slurm.schedmd.com/rest_api.html). It can be a valid choice to ignore these APIs for now - I guess it will take a while until a standard emerges - but if possible, try to anticipate some of these changes in the design of the package, e.g. by making it possible to implement other forms of authentication & check whether the functionality provided by the package could be mapped to the REST endpoints in these APIs. In the end these can be small things - e.g. AiiDA assumed it could do posix file system operations on the remote storage, which does not map to these APIs. |
To just comment on the
The Still I agree that if we build a minimalistic queuing system adapter which just provides the commands for different queuing systems, then such a tool might also be relevant for the developers of |
Hi @ltalirz, Thanks for your thoughts! Regarding REST API's, definitely this is something to be kept in mind and we actually considered it initally. We started "without" it but not preventing from moving to it. Our reasoning was that in any case, currently most computing centers are still providing either only "standard" command line or both command line + rest api. |
Hi all, very interesting thread. In AiiDA, while the code for scheduler management is part of the main aiida package, it is actually by design decoupled in a submodule ( The only addition is that it supports tunnelling those commands via the transport interface (also part of AiiDA in So, converging on a single package to manage the scheduler wouldn't be a huge amount of work (even if as @gpetretto mentions, probably when one goes to see the details, things become a bit more complex). However, an ongoing project in AiiDA is indeed to extend the scheduler part to support completely different approaches to deal with REST-API-based schedulers, in particular FirecREST already mentioned by @ltalirz. Therefore, I'd be hesitant to invest energy in a new package if this requirement is not included already in the design, since it will not bring additional benefit to most of us who already have the code to deal with schedulers. If we instead address this issue from the beginning, then the new package becomes very powerful because in addition to providing existing functionality independent of the underlying engine, it's also future proof. Pinging here @khsrali who's actively working on the support for FirecREST in AiiDA, and also @GeigerJ2 @sphuber @mbercx for info |
Hi there, yes I agree with @giovannipizzi. The way AiiDA is designed, makes it possible to converge to other packages for scheduler & transport relatively easy. In general, is a great idea to have a standard package that can deal with REST-API-based schedulers. |
Thanks all for your comments, suggestions and insights. It seems to me there is enough attraction and interest to try to converge to a common package/api for both cmdline based and restapi based schedulers (possibly transport ? even if I would probably see that as a second or separate topic). I propose we have a first meeting to discuss about needs, constraints, broad view on this topic. Would that be ok for everyone ? If so, I can send a doodle to choose a date. |
While the materials project, Aiida and pyiron all started from different directions and developed different approaches to high throughput atomistic simulation in materials science, I wonder if we could agree on a shared standard to interface with the queuing system. Especially, as we already all converged to using fabric / paramiko for handling the SSH connection, two factor authentication and so on.
In analogy to using conda-forge for package distribution and optimade as a central interface for accessing atomistic databases, I think agreeing on a joined package for queuing system submission would allow users to test different workflow frameworks more easily and reduce the workload the developers.
For pyiron the queuing system interface is available in the pysqa package. To simplify the configuration for the users pysqa uses
jinja2
templates for converting the submission script templates provided by the computing centres to templates which can be used by pysqa - SLURM example.I am aware that the ExaWorks project attempted the same with their development of the psij-python package still I feel that if we could find a shared bases between the materials project, Aiida and pyiron that could already be a step in the right direction.
The text was updated successfully, but these errors were encountered: