-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unregular behaviour while using different cpus combination #5548
Comments
Hi @JTaozhang |
Hi, I think ,different machine has different setting, I am not sure you can reproduce my case with your machine. Maybe you can change your combination using my atomic system to check this problem. one more question, less tasks in a node means that the memory of the node will be less shared by other task, right? The mp_num_thread decides how the cpus are distributed to one task, which governs the parallel computing. Best, |
Could you supply your STRU and charge density files for nscf? |
ok. I attached the file here. Best. |
Thank you, but it seems that you failed to upload the files. |
No description provided. |
I think the uploading failed, due to the large size, I will share it by baidu cloud. 链接:https://pan.baidu.com/s/1lphoofZi1MhJh49etZ2weQ Best |
Describe the bug
Hi there,
Currently, I am working on a WTe2 bilayer systems, which contains about 504 atoms. I try to calculate the band structure with kpoint mesh of 11 along the high symmetric path. The software version is v3.8.2. With the same INPUT setting and KPT settings, but adopting different cpus combinations, one works and another runs abnormal. Specifically it reports nothing, no error and no useful information. one is mpirun -np 8 -env OMP_NUM_THREADS=28 and total cpus is 224(8 nodes, 56 cpus per node), another is mpirun -np 20 -env OMP_NUM_THREADS=28 and total cpus is 560 (10 nodes, 56 cpus per node).
for the abnormal job, the whole outoput information shows below,
I don't know what causes this abnormal behavior, could you test the code? I think the parallel calculation part may still possess some unstable problem. this problem I also dicussed in the wechat online group, somebody suggest me to propose an issue here. So I do this.
related file is here
WTe2.zip
Expected behavior
the second submitting setting should work fast than the first setting.
To Reproduce
Environment
module load cmake/cmake-3.25 gnu/12.1.0
source /share/apps/intel2022/setvars.sh
source /share/home/zhangtao/software/abacus-develop-3.8.3/toolchain/install/setup
Additional Context
no more information is needed
Task list for Issue attackers (only for developers)
The text was updated successfully, but these errors were encountered: