Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic integration test for the common relax workflow #162

Open
sphuber opened this issue Mar 10, 2021 · 18 comments
Open

Basic integration test for the common relax workflow #162

sphuber opened this issue Mar 10, 2021 · 18 comments

Comments

@sphuber
Copy link
Collaborator

sphuber commented Mar 10, 2021

It is kind of difficult to setup integration tests for actually running the common workflows because they require not only a lot of setup in the Python environment with the various plugins and their required data, such as pseudos, but also the codes themselves need to be compiled. I don't see how we can this integrate into the GHA anytime soon, but we in the meantime we should have a manual alternative to have some kind of verification.

I propose that each developer runs the common relax workflow for all three protocols for the silicon structure. To ensure that we can get an environment that is as reproducible as possible to ensure everyone runs with the same environment, I propose to use docker. After you have installed docker on your machine, you can run the following:

# Create a docker compose file
cat > docker-compose.yml << EOF
version: '2'

services:
  quantum-mobile:
   # image: "marvelnccr/quantum-mobile:20.11.2a"   This is actually outdated and does not contain the latest versions of all plugins or codes
    image: "marvelnccr/quantum-mobile:develop"
    container_name: quantum-mobile
    privileged: true
    volumes:
    - "/sys/fs/cgroup:/sys/fs/cgroup:ro"
    environment:
      LC_ALL: "en_US.UTF-8"
      LANG: "en_US.UTF-8"
EOF

# Start the docker container and enter it with an interactive shell
docker-compose up -d
docker exec -it --user max quantum-mobile /bin/bash

# Enable the virtual env into which AiiDA is installed
workon aiida

# We need to make sure we have the latest commit of `aiida-common-workflows` because the latest release of QM doesn't include it yet
cd /home/max/codes
git clone https://github.com/aiidateam/aiida-common-workflows.git
cd aiida-common-workflows
pip install --upgrade pip
pip install -e .
reentry scan

# Now run the steps necessary to setup your plugin that you have entered in the SI
# Make sure that you only execute these steps and nothing else.
# If you need to add more code, make sure to update the SI!

# Restart the daemon
verdi daemon stop
verdi daemon start

# Now run the relax workchain for your plugin for all three protocols for an extended system and a molecule
# If the code doesn't support extended system, you can only run the molecule workflows of course.
for protocol in fast moderate precise; do
    aiida-common-workflows launch relax -S Si -p $protocol -d <PLUGIN>
    aiida-common-workflows launch relax -S NH3-planar -p $protocol -d <PLUGIN>
done
@bosonie
Copy link
Collaborator

bosonie commented Mar 11, 2021

Since Orca and Gaussian can not perform calculations on extended systems. Does it make sense to provide a similar script for the single relaxation of a molecule. H2 would be sufficient. The script should be the same no @sphuber ?

@sphuber
Copy link
Collaborator Author

sphuber commented Mar 11, 2021

@bosonie I have updated the script to also run the ammonia example.

@bosonie
Copy link
Collaborator

bosonie commented Mar 11, 2021

Ok, I believe that H2 was easier since it is 2 atoms and not 4, but whatever

@adegomme
Copy link
Collaborator

adegomme commented Mar 11, 2021

btw, the marvelnccr/quantum-mobile:develop tag is the same as docker 20.11.2a on docker hub, for now, so it uses old plugins as well.
https://hub.docker.com/r/marvelnccr/quantum-mobile/tags?page=1&ordering=last_updated

@sphuber
Copy link
Collaborator Author

sphuber commented Mar 11, 2021

Ok, I believe that H2 was easier since it is 2 atoms and not 4, but whatever

But I thought that certain plugins had problems with this example since they had to specify the electronic type and or starting magnetization was causing problems. Happy to change if this is not the case.

@sphuber
Copy link
Collaborator Author

sphuber commented Mar 11, 2021

btw, the marvelnccr/quantum-mobile:develop tag is the same as docker 20.11.2a on docker hub, for now, so it uses old plugins as well.
https://hub.docker.com/r/marvelnccr/quantum-mobile/tags?page=1&ordering=last_updated

That is weird. I ran the script yesterday composing from develop and I think I got the most recent versions of the plugins. Come to think off it, maybe I just checked out master of aiida-common-workflows and installed it, which caused all plugins to be updated. Yeah, that must be it.

@bosonie
Copy link
Collaborator

bosonie commented Mar 11, 2021

But I thought that certain plugins had problems with this example since they had to specify the electronic type and or starting magnetization was causing problems. Happy to change if this is not the case.

Ah ok, you are right, I was thinking to do H2 with no spin, but you are right. Just leave it like that! Sending an email now to others

@sponce24
Copy link
Collaborator

Hello,

I'm new to docker so this is my general feedback but maybe I missed some obvious stuff:

  1. For some reason, with the steps above, I had to use sudo:
sudo docker-compose up -d
sudo docker exec -it --user max quantum-mobile /bin/bash
  1. Once in the docker (so in the QM:develop), the aiida-common-workflows is not installed by default. Is that normal?
    In contrast aiida-aiidalab, aiida-jupyterlab and aiida-quantumespresso is pre-installed.

I had to do the following steps:
workon aiida
pip install aiida
pip install aiida-pseudo
git clone https://github.com/aiidateam/aiida-common-workflows.git
cd aiida-common-workflows
pip install -e.

  1. Then I ran that command:
    aiida-common-workflows launch relax -S Si -X 1 -- abinit

However this does not work and return the error:

Usage: aiida-common-workflows launch relax [OPTIONS] [--] []
Try "aiida-common-workflows launch relax -h" for help.

Error: Invalid value for "[]": invalid choice: abinit. (choose from )

Any idea ?
Thanks,
Sam

@sphuber
Copy link
Collaborator Author

sphuber commented Mar 11, 2021

Thanks @sponce24 , you indeed have to add some lines to script, this is my bad. When I first did this myself, I noticed that the latest version of the QM docker doesn't actually have the latest version of aiida-common-workflows and the associated plugins installed. When I then wrote this script after having done everything I forgot about this step. What you did is almost ok (except the aiida package is incorrect, it is an old meta package, always use aiida-core) but can be simplified to:

git clone https://github.com/aiidateam/aiida-common-workflows.git
cd aiida-common-workflows
pip install -e .
reentry scan

Installing aiida-common-workflows should install all the plugins and their dependencies so aiida-core and aiida-pseudo should be installed automatically. The reentry scan should actually fix the problem you were seeing. The usual problem of new entry points not automatically showing up in the cache after installing a new plugin is the culprit here.

I have updated the script with what should work.

@eimrek
Copy link
Member

eimrek commented Mar 12, 2021

Hi, I was able to run the NH3 relaxation for Gaussian through docker.

However:

  1. of course many commands are needed to set up the remote computer, ssh connection and codes, but this is as expected

  2. I had to do a verdi daemon restart and then the submission worked

  3. The remote computer I am running Gaussian on uses the LSF scheduler, which requires tot_num_mpiprocs to be set and num_machines cannot be set.
    So I modified the following code by replacing num_machines with tot_num_mpiprocs (bit of a hack, but everything worked)

    engines[engine] = {
    'code': code.full_label,
    'options': {
    'resources': {
    'num_machines': number_machines[index],
    },
    'max_wallclock_seconds': wallclock_seconds[index],
    }
    }

As this is not a problem of the gaussian workflow but rather the LSF scheduler and/or CLI, i'll go ahead and set gaussian tests as green in the google sheet. (let me know if you disagree)

@Tseplyaev
Copy link

I am trying using docker and the set-up script. However, pip install -e . step fails with:

 Using cached pyhull-1.4.0.tar.gz (302 kB)
    ERROR: Command errored out with exit status 1:
     command: /home/max/.virtualenvs/aiida/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-i3fp5_hw/pyhull_5449788c2e5e48b8afc187c7e9392b1b/setup.py'"'"'; __file__='"'"'/tmp/pip-install-i3fp5_hw/pyhull_5449788c2e5e48b8afc187c7e9392b1b/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-msayxsad
         cwd: /tmp/pip-install-i3fp5_hw/pyhull_5449788c2e5e48b8afc187c7e9392b1b/
    Complete output (31 lines):
    Downloading http://pypi.python.org/packages/source/d/distribute/distribute-0.6.35.tar.gz
    Traceback (most recent call last):
      File "/tmp/pip-install-i3fp5_hw/pyhull_5449788c2e5e48b8afc187c7e9392b1b/distribute_setup.py", line 150, in use_setuptools
        raise ImportError
    ImportError
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-i3fp5_hw/pyhull_5449788c2e5e48b8afc187c7e9392b1b/setup.py", line 7, in <module>
        use_setuptools()
      File "/tmp/pip-install-i3fp5_hw/pyhull_5449788c2e5e48b8afc187c7e9392b1b/distribute_setup.py", line 152, in use_setuptools
        return _do_download(version, download_base, to_dir, download_delay)
      File "/tmp/pip-install-i3fp5_hw/pyhull_5449788c2e5e48b8afc187c7e9392b1b/distribute_setup.py", line 131, in _do_download
        to_dir, download_delay)
      File "/tmp/pip-install-i3fp5_hw/pyhull_5449788c2e5e48b8afc187c7e9392b1b/distribute_setup.py", line 201, in download_setuptools
        src = urlopen(url)
      File "/usr/lib/python3.7/urllib/request.py", line 222, in urlopen
        return opener.open(url, data, timeout)
      File "/usr/lib/python3.7/urllib/request.py", line 531, in open
        response = meth(req, response)
      File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
        'http', request, response, code, msg, hdrs)
      File "/usr/lib/python3.7/urllib/request.py", line 569, in error
        return self._call_chain(*args)
      File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
        result = func(*args)
      File "/usr/lib/python3.7/urllib/request.py", line 649, in http_error_default
        raise HTTPError(req.full_url, code, msg, hdrs, fp)
    urllib.error.HTTPError: HTTP Error 403: SSL is required
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Has anyone faced the same problem? Isn't it weird that I have such an error and others don't? Isn't the docker container supposed to be identical for everyone?

@sphuber
Copy link
Collaborator Author

sphuber commented Mar 17, 2021

Haven't seen this, but it looks like it tries to install setuptools (as a result of installing pyhull) which fails because it is trying to use HTTP when HTTPS is required. Could you try to install it first manually? So run pip install setuptools and then try again

@Tseplyaev
Copy link

setuptools was already installed within docker image but the problem remains

@sphuber
Copy link
Collaborator Author

sphuber commented Mar 19, 2021

Can confirm that I am now experiencing the same problem after having destroyed my container and rebuilding from scratch. Not sure what has changed since I initially did this and it worked. Will have to find out who added the pyhull requirement.

@sphuber
Copy link
Collaborator Author

sphuber commented Mar 19, 2021

It seems that upgrading pip may do the trick, it fixed the problem for me at least:

pip install --upgrade pip

@Tseplyaev
Copy link

It solves the problem for me too, thanks!

@bosonie
Copy link
Collaborator

bosonie commented Mar 23, 2021

I believe the problem saw by @Tseplyaev and @sphuber should be carefully monitored.
I had the same problem reinstalling the package now on QuantumMobile 20.11.2a. The pip install --upgrade pip solves the problem but brings incompatibility with aiida-lab. Also the installation seems weird since messages like below appear.

INFO: pip is looking at multiple versions of pydispatcher to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of abipy to determine which version is compatible with other requirements. This could take a while.

So something changed and it is not due to the docker.

@sphuber
Copy link
Collaborator Author

sphuber commented Mar 24, 2021

Those messages are most likely due to the new dependency resolver that they introduced in recent versions of pip. This is more powerful but also more complex and sometimes resolving dependency conflicts for environments with lots of dependencies can be time consuming which is why I imagine they just added a warning that it can take some time. It should not actually pose any problems as long as it can resolved any conflicts.

@chrisjsewell that being said, maybe it is useful to add a statement that updates pip in the ansible of the QM?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants