Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use num_machines * num_mpiprocs_per_machines to set num_cpus #33

Conversation

unkcpz
Copy link
Member

@unkcpz unkcpz commented Aug 19, 2024

For backward compatibility where num_machines and num_mpiprocs_per_machine are setting.

I only setting the default value as 1 for num_mpiprocs_per_machine because aiida-quantumespresso override
resources default with num_machines set to 1 and then get builder with such setting.
The num_mpiprocs_per_machine sometime can be read from "Default #procs/machine" of computer setup but if it is not exist the builder can not be properly get without passing option to builder generator.
It is anyway a workaround for backward compatibility so this default is implemented despite it is quite specific for the qe plugin.

@unkcpz unkcpz force-pushed the Compatible-with-slurm-resources-type-setting-for-backward-compatibility branch from e46d949 to faa0ca0 Compare August 19, 2024 15:14
@unkcpz unkcpz force-pushed the Compatible-with-slurm-resources-type-setting-for-backward-compatibility branch from dd54854 to e97ac6a Compare August 19, 2024 15:28
@unkcpz unkcpz force-pushed the Compatible-with-slurm-resources-type-setting-for-backward-compatibility branch from 77f7de1 to bdcd861 Compare August 20, 2024 13:58
@unkcpz unkcpz force-pushed the Compatible-with-slurm-resources-type-setting-for-backward-compatibility branch from 322e412 to 4d779ee Compare August 20, 2024 14:25
@unkcpz
Copy link
Member Author

unkcpz commented Sep 11, 2024

@superstar54 can you have a look at this PR? this is needed by the qeapp to set num_cpu through the default num_procs as we discussed.

@unkcpz unkcpz requested a review from superstar54 September 11, 2024 08:50
Copy link
Member

@superstar54 superstar54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @unkcpz thanks for the work!

# The `num_mpiprocs_per_machine` sometime can be read from "Default #procs/machine" of computer setup but if it is not exist
# the builder can not be properly get without passing `option` to builder generator.
# It is anyway a workaround for backward compatibility so this default is implemented despite it is quite specific for the qe plugin.
resources.num_cpus = kwargs.pop("num_machines") * kwargs.pop(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that this method is only used to validate the resource, and the return resources is not used to override the actual resources, thus setting num_cpus here will not help.

Copy link
Member Author

@unkcpz unkcpz Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the qeapp yes, you are correct, it will then be override by the change in here https://github.com/aiidalab/aiidalab-qe/pull/795/files. But this setting is not only for qeapp but also for the calculation that didn't set num_cpus it will directly use num_machines and num_mpiprocs_per_machine (with the warning raised below) not fail the calculation.

"Please set `num_cpus` and `memory_mb`."

message = f"{message} (this will be removed in aiida-hyperqueue v1.0)"
warnings.warn(message, AiiDAHypereQueueDeprecationWarning, stacklevel=3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my previous comment is ture, here we need to raise the error message instead of warnning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No the raise is in the KeyError conditional branch. Here the warning is from else branch which warns when the num_cpus is get from num_mpiprocs_per_machine and num_machines instead of directly set.

@unkcpz unkcpz requested a review from superstar54 September 11, 2024 12:11
Copy link
Member

@superstar54 superstar54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification! One final comment—feel free to update or not.

Comment on lines +67 to +68
# TODO: I only setting the default value as 1 for `num_mpiprocs_per_machine` because aiida-quantumespresso override
# resources default with `num_machines` set to 1 and then get builder with such setting.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aiida-hyperqueue should be independent from aiida-quantumespresso, could you rewrite this comment to make it more general?

Is the TODO something that we need to work on in the future? Could you open an issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point, in fact I don't have strong idea, the aiida-quantumespresso is actually also quite rely on scheduler is slurm-like.
I personally want to make hyperequeue an independent scheduler not inherent or rely on anything that is assumed as "base scheduler" mode. So the issue is kind of this one #8 but I have no good overview on how to having a good design on aiida-core side to solve this.

@unkcpz unkcpz merged commit bd6eb68 into aiidateam:main Sep 16, 2024
3 checks passed
@unkcpz unkcpz deleted the Compatible-with-slurm-resources-type-setting-for-backward-compatibility branch September 16, 2024 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants