Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conflict in compatible ASE version? #291

Closed
jungsdao opened this issue Feb 6, 2024 · 31 comments
Closed

conflict in compatible ASE version? #291

jungsdao opened this issue Feb 6, 2024 · 31 comments

Comments

@jungsdao
Copy link
Contributor

jungsdao commented Feb 6, 2024

I think following part of generate/optimize.py requires the latest version of ASE '3.23.0b1'
6 from ase.filters import FrechetCellFilter

But wfl seems to conflict with espresso.py in ASE '3.23.0b1' showing following error. Because of this, I had to downgrade only espresso.py to make it work. (copied from ASE 3.22.1)
I'm not totally sure this is related with ASE version though but downgrading didn't cause the error.

Exception: Failed to construct calculator, original attempt's exception was 'No configuration of espresso'
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/u/hjung/conda-envs/mace_env/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/u/hjung/conda-envs/mace_env/lib/python3.9/site-packages/wfl/autoparallelize/pool.py", line 70, in _wrapped_autopara_wrappable
    outputs = op(*u_args, **kwargs)
  File "/u/hjung/conda-envs/mace_env/lib/python3.9/site-packages/wfl/calculators/generic.py", line 80, in _run_autopara_wrappable
    raise ValueError(f"Failed to construct calculator, original attempt's exception was '{calculator_failure_message}'")
ValueError: Failed to construct calculator, original attempt's exception was 'No configuration of espresso'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/raven/ptmp/hjung/GAP/scratch/unkownhost-_home_hjung/run_eval_dft_chunk_0_p3PO4jDEENxODL0w0lYdfsGOwcb77VNFitO-qwTh3mg=_zbpx1t0g/_ex
pyre_script_core.py", line 9, in <module>
    results = function(*args, **kwargs)
  File "/u/hjung/conda-envs/mace_env/lib/python3.9/site-packages/wfl/autoparallelize/pool.py", line 157, in do_in_pool
    for result_group in results:
  File "/u/hjung/conda-envs/mace_env/lib/python3.9/multiprocessing/pool.py", line 870, in next
    raise value
ValueError: Failed to construct calculator, original attempt's exception was 'No configuration of espresso'
@bernstei
Copy link
Contributor

bernstei commented Feb 6, 2024

They keep on changing the way DFT calculators get initialized. I was pretty sure it was working with all the different ways Espresso was initialized. How exactly did you install ASE when it wasn't working? The version number isn't sufficient, because they keep making changes without changing the version number, at least in the gitlab version.

@jungsdao
Copy link
Contributor Author

jungsdao commented Feb 6, 2024

The way I installed ASE when it didn't work was :
pip install --upgrade git+https://gitlab.com/ase/ase.git@master

@bernstei
Copy link
Contributor

bernstei commented Feb 6, 2024

Thanks. Let me see if I can reproduce the problem. I assume you're also using the latest version of wfl ?

@jungsdao
Copy link
Contributor Author

jungsdao commented Feb 6, 2024

Yes, I'm also using the latest version of wfl. (v 0.2.0)

@bernstei
Copy link
Contributor

bernstei commented Feb 6, 2024

I just tried with the latest ASE master branch (and the latest wfl main branch), and the Espresso-related tests passed. If you clone the wfl repo, you should be able to do (from the cloned directory)

pytest --basetemp ${HOME}/pytest_wfl -rxXs tests/calculators/test_qe.py

after setting the environment variable PYTEST_WFL_ASE_ESPRESSO_COMMAND to the command that run a serial pw.x (I use mpirun -np 1 pw.x for example). If that fails, we need to figure out why, since it's passing for me. If it passes, but your real script fails, we should be able to figure out why.

@stenczelt
Copy link
Member

The way I installed ASE when it didn't work was : pip install --upgrade git+https://gitlab.com/ase/ase.git@master

This might be best put in the docs, or getting the ASE devs to finally make a release (3.22 was released in 2021) because if you install according to the wfl docs then you are seeing the same even with importing from wfl.generate.optimize import optimize.

@bernstei
Copy link
Contributor

I'm confused - that git command above did work, or didn't? It looks like the command that should give the latest, which should work.

@bernstei
Copy link
Contributor

@stenczelt As a person who ran into this, where do you think it should be documented so it's most likely to be noticed?

@bernstei
Copy link
Contributor

bernstei commented Feb 26, 2024

Top level README.md ? Anyplace else? I guess the install command in the docs could, in principle drag in the older and incompatible ASE (although I was sort of assuming people had their own ASE already installed). I think there's a beta release number - we could require that as the minimum version, which will always fail until they actually have another release, but at least you'll know you have to do it manually.

@jungsdao
Copy link
Contributor Author

Sorry for belated reply.
I have checked again and now I found the point where it can be reproduced.
This happens when Quantum espresso job is submitted to remote cluster and the ASE version installed in the cluster is 3.23.0b1. I think pytest in current wfl passed without error probably because it does not submit remote job and tested only locally. When I downgrade espresso.py in the cluster to older version ( like 3.22.1), I don't get this error.

@bernstei
Copy link
Contributor

If you have the latest wfl and ASE (github master HEAD) on both local and remote machines, then it should definitely work.

@bernstei
Copy link
Contributor

I also thought it should work with the older version, actually, so I'll also check why it's not.

@bernstei
Copy link
Contributor

@jungsdao I just ran the wfl (the latest github version of wfl) pytests with the pip version of ASE (3.22.1), and it passed, and also with the latest gitlab master HEAD (3.23.0b1), and it also passed. I'm not sure why it's not working for you. Is it possible that the wfl version on the remote machine isn't the latest?

@jungsdao
Copy link
Contributor Author

I have checked again after updating both ASE and wfl to the latest version but I'm having the same error. When I change espresso.py in remote cluster to ASE 3.22.1 it works, but with ASE 3.23.0b1 it fails.

@bernstei
Copy link
Contributor

I'm not sure what's going on, but I don't see any way for the remote behavior to be different from the local behavior if they're running the same versions of wfl and ase. I guess I'll test it explicitly here.

Can you find the directory where the submitted job ran and grab all the output and error files and upload them here? I'm hoping there's more info on where exactly it's having a problem.

I wonder if something is messed up with the PYTHONPATH for the remote job, and it's not loading the wfl version you intend it to.

@jungsdao
Copy link
Contributor Author

These are the related files in the submitted job directory. I'm not quite sure what's the source of error. It seems correctly launching intended version of wfl.

failed.tar.gz

@bernstei
Copy link
Contributor

Thanks. I might need to give you a version that can produce better error information. I'll investigate some things here first.

@bernstei
Copy link
Contributor

I just added a test that runs a remote Espresso job, and it runs fine (#294). I'll look a bit more, but I think there has to be some sort of version issue with you remote jobs. It's pretty easy for the remote job to end up with different paths, PYTHONPATH, etc. Can you describe your setup in more detail? Is it really a remote job, or is it just a queued job and the main workflow running on the login node of the HPC?

Can you post the workflow script (or, ideally, a simpler script that shows the same problem) here?

@bernstei
Copy link
Contributor

If you can install wfl from the espresso_remote_job_test branch (instead of main) that version should provide us with better error information for the way your code is failing.

@stenczelt
Copy link
Member

@stenczelt As a person who ran into this, where do you think it should be documented so it's most likely to be noticed?

A notice in the top level ReadMe is a good idea, I've actually looked at the documentation this time, so maybe a paragraph or one more code block in the Installation section would be useful:
https://libatoms.github.io/workflow/#installation

@bernstei
Copy link
Contributor

@stenczelt please take a look at the changes in #294 . I'm not sure there's an easy way to see the formatted docs (the README you can see by switching to that branch), but you can look at the .rst source file changes.

@bernstei
Copy link
Contributor

bernstei commented Mar 1, 2024

@jungsdao Have you had a chance to test the espresso_remote_job_test branch? It should give more error information if you're still having this problem.

@jungsdao
Copy link
Contributor Author

jungsdao commented Mar 1, 2024

I have tried with espresso_remote_job_test branch in remote cluster and it gives following error. (from _expyre_job_error)

  1 Exception: Failed to construct calculator, original attempt's exception was '(exc)
  2 Traceback (most recent call last):
  3   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/generic.py", line 49, in _run_autopara_wrappable
  4     calculator_default = construct_calculator_picklesafe(calculator)
  5   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/utils/parallel.py", line 51, in construct_calculator_picklesafe
  6     return calculator[0](*c_args, **c_kwargs)
  7   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/espresso.py", line 88, in __init__
  8     super().__init__(keep_files=keep_files, rundir_prefix=rundir_prefix,
  9   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/wfl_fileio_calculator.py", line 48, in __init__
 10     super().__init__(**kwargs)
 11   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/ase/calculators/espresso.py", line 216, in __init__
 12     super().__init__(
 13   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/ase/calculators/genericfileio.py", line 336, in __init__
 14     raise EnvironmentError(f'No configuration of {template.name}')
 15 ase.calculators.calculator.EnvironmentError: No configuration of espresso
 16 '
 17 multiprocessing.pool.RemoteTraceback:
 18 """
 19 Traceback (most recent call last):
 20   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/multiprocessing/pool.py", line 125, in worker
 21     result = (True, func(*args, **kwds))
 22   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/autoparallelize/pool.py", line 70, in _wrapped_autopara_wrappable
 23     outputs = op(*u_args, **kwargs)
 24   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/generic.py", line 86, in _run_autopara_wrappable
 25     raise ValueError(f"Failed to construct calculator, original attempt's exception was '{calculator_failure_message}'")
 26 ValueError: Failed to construct calculator, original attempt's exception was '(exc)
 27 Traceback (most recent call last):
 28   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/generic.py", line 49, in _run_autopara_wrappable
 29     calculator_default = construct_calculator_picklesafe(calculator)
 30   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/utils/parallel.py", line 51, in construct_calculator_picklesafe
 31     return calculator[0](*c_args, **c_kwargs)
 32   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/espresso.py", line 88, in __init__
 33     super().__init__(keep_files=keep_files, rundir_prefix=rundir_prefix,
 34   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/wfl_fileio_calculator.py", line 48, in __init__
 35     super().__init__(**kwargs)
 36   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/ase/calculators/espresso.py", line 216, in __init__
 37     super().__init__(
 38   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/ase/calculators/genericfileio.py", line 336, in __init__
 39     raise EnvironmentError(f'No configuration of {template.name}')
 40 ase.calculators.calculator.EnvironmentError: No configuration of espresso
 41 '
 42 """
 43 
 44 The above exception was the direct cause of the following exception:
 45 
 46 Traceback (most recent call last):
 47   File "/raven/ptmp/hjung/GAP/scratch/unkownhost-_home_hjung/run_eval_dft_chunk_0_dfzhb4Sm89qkJVMNcICoHzKH9gGfe3KPYGE3Vecnk_8=_c5vhqbuu/_expyre_script_core.py", line 9, in <module>
 48     results = function(*args, **kwargs)
 49   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/autoparallelize/pool.py", line 157, in do_in_pool
 50     for result_group in results:
 51   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/multiprocessing/pool.py", line 870, in next
 52     raise value
 53 ValueError: Failed to construct calculator, original attempt's exception was '(exc)
 54 Traceback (most recent call last):
 55   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/generic.py", line 49, in _run_autopara_wrappable
 56     calculator_default = construct_calculator_picklesafe(calculator) 57   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/utils/parallel.py", line 51, in construct_calculator_picklesafe
 58     return calculator[0](*c_args, **c_kwargs)
 59   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/espresso.py", line 88, in __init__
 60     super().__init__(keep_files=keep_files, rundir_prefix=rundir_prefix,
 61   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/wfl_fileio_calculator.py", line 48, in __init__
 62     super().__init__(**kwargs)
 63   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/ase/calculators/espresso.py", line 216, in __init__
 64     super().__init__(
 65   File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/ase/calculators/genericfileio.py", line 336, in __init__
 66     raise EnvironmentError(f'No configuration of {template.name}')
 67 ase.calculators.calculator.EnvironmentError: No configuration of espresso
 68 '

@bernstei
Copy link
Contributor

bernstei commented Mar 1, 2024

How are you passing the pw.x command to the calculator constructor?

And can you confirm that you can manually create an Espresso calculator (outside of wfl) using the arguments (positional or kwargs) you're passing the calculator constructor you're trying to use in wfl?

[edited] the ASE Espresso calculator switched from a command keyword arg to an EspressoProfile, which the wrapper reconstructs from the calc_exec argument. It's possible that if you're passing a command but the wrapper is detecting that you have a version that supports the profile, it's not handling that combination well]

@bernstei
Copy link
Contributor

bernstei commented Mar 5, 2024

@jungsdao If you can answer the questions in my previous post, we can hopefully fix this. I suspect a conflict between the different ways of passing the executable to Espresso.

@jungsdao
Copy link
Contributor Author

jungsdao commented Mar 5, 2024

I used to pass pw.x command via environ variable in slurm submission script.
export ASE_ESPRESSO_COMMAND='srun /u/hjung/Softwares/QE/qe-7.0/bin/pw.x -in PREFIX.pwi > PREFIX.pwo'

When I tried to execute ASE espresso outside of wfl, I got following error complaining profile

 11 Traceback (most recent call last):
 12   File "/raven/u/hjung/test/test.py", line 57, in <module>
 13     calc = Espresso(command=command, input_data=input_data, kpts=(4, 4, 1), pseudopotentials=psp)
 14   File "/u/hjung/conda-envs/mace_env/lib/python3.9/site-packages/ase/calculators/espresso.py", line 201, in __in    it__
 15     raise RuntimeError(compatibility_msg)
 16 RuntimeError: Espresso calculator is being restructured.  Please use e.g. Espresso(profile=EspressoProfile(argv=    ['mpiexec', 'pw.x'])) to customize command-line arguments.

Like you have explained it should definitely have to do with new profile argument required by new ASE espresso.

@bernstei
Copy link
Contributor

bernstei commented Mar 5, 2024

OK. You should be able to get it to work by passing a new argument to the wfl.calculators.Espresso wrapper calc_exec = "srun /u/hjung/Softwares/QE/qe-7.0/bin/pw.x" (without the PREFIX stuff).

I'll also think about how to get it to work best with both the old and new syntax, if possible, but I think passing a command via the env var is more or less deprecated.

@jungsdao
Copy link
Contributor Author

jungsdao commented Mar 5, 2024

Just confirmed that adding calculator_exec" : "srun /u/hjung/Softwares/QE/qe-7.0/bin/pw.x" to QE kwargs do not cause the previous error.

@bernstei
Copy link
Contributor

bernstei commented Mar 5, 2024

Just confirmed that adding calculator_exec" : "srun /u/hjung/Softwares/QE/qe-7.0/bin/pw.x" to QE kwargs do not cause the previous error.

OK - I'll see what I can do to make things internally consistent, and then merge the PR

@bernstei
Copy link
Contributor

bernstei commented Mar 6, 2024

I think I have a solution that will at least give clearer error messages. I'll merge as soon as I push and tests pass.

@bernstei
Copy link
Contributor

bernstei commented Mar 6, 2024

closed by #294

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants