Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REST API deployment: reconsider use of Process / Application groups #4494

Open
ltalirz opened this issue Oct 20, 2020 · 3 comments
Open

REST API deployment: reconsider use of Process / Application groups #4494

ltalirz opened this issue Oct 20, 2020 · 3 comments

Comments

@ltalirz
Copy link
Member

ltalirz commented Oct 20, 2020

The current deployment scheme for AiiDA REST APIs on Materials Cloud is to use one WSGI daemon and one process group per AiiDA profile:

...
    WSGIDaemonProcess rest-myprofile \
        user=ubuntu group=ubuntu \
        threads=5 \
        python-home=/home/ubuntu/.virtualenvs/aiida \
        display-name=aiida-rest-myprofile

    # REST API will be served on <host>/myprofile/api/v4
    WSGIScriptAlias /myprofile /home/ubuntu/wsgi/myprofile-rest.wsgi
    <Location /myprofile>
        WSGIProcessGroup myprofile
    </Location>
...

This results in each WSGI daemon loading the required python modules into memory (resulting in a memory footprint of ~100 MB for every AiiDA profile served).

It is, in principle, possible to define Application groups, which let all wsgi applications within the same group share the same python interpreter.

I believe this won't work with how things currently work inside AiiDA, since this would require profile switching within the interpreter. However, once profile switching has been implemented, it could be an alternative approach to the solution suggested here, which suggested to support profile switching within the REST API application itself.
The latter solution would likely involve some API change (e.g. a profile prefix in the URL path), but would also be more flexible.

@ltalirz
Copy link
Member Author

ltalirz commented Jul 28, 2021

also related #4374 (comment)

@ltalirz
Copy link
Member Author

ltalirz commented Jul 15, 2022

Now that profile switching via REST API is implemented, this question has resurfaced.

According to benchmarks by @eimrek in #5054 (comment), the overhead of profile switching is currently at several 100ms.
@sphuber mentions the instantiation of the sqlalchemy connection and engine alone takes about 200ms.

Given that establishing a new postgres connection should take <10ms (see #4374 (comment)), it would be interesting to profile this sqla initialization step and understand whether it can be made faster.

I guess this would potentially also benefit the startup time of any AiiDA-related code / shell (?).

@sphuber
Copy link
Contributor

sphuber commented Jul 15, 2022

I need to add a bit more nuance the statement that the engine connection takes about 200ms. The 200 ms was the difference in average request time in the benchmark performed by @eimrek in #5054 . Since the main difference in code trajectory for requests that have to switch profiles is just unloading the current and loading the database connection of the new profile, I presumed that the majority of that 200 ms can probably be attributed to those actions. However, I cannot be sure as I don't know how long the unload/load profile cycle takes on the machine used by those benchmarks.

To get a sense, I ran some benchmarks on my workstation where I am running and accessing a REST API on localhost.

I ran a benchmark isolating just the part of unloading the current profile, loading the new one and then loading the storage backend, which in the case of psql_dos should instantiate a connection to the database and perform a check. This was done using the following function

def change_profile(profile_a, profile_b):
    from aiida.manage import get_manager
    manager = get_manager()
    manager.load_profile(profile_a, allow_switch=True)
    start = time.time()
    manager.load_profile(profile_b, allow_switch=True)
    manager.get_profile_storage()
    return time.time() - start

Note that we cannot do something like

%timeit manager.load_profile(profile_a, allow_switch=True)

because after the first iteration, the profile will have been switched and the call is essentially almost a no-op.

On my workstation, I got times of roughly 35 ms for switching a profile.

I then ran a benchmark of firing 50 requests to the REST API and recording the request time, doing this for just a single profile (no switching) or by alternating between two profiles, forcing the server to switch profiles at each request (this corresponds to the worst case scenario). These are the timings:

No switching
min 0.013
max 0.081
avg 0.020

With switching
min 0.058
max 0.131
avg 0.065

We can see that the average increases by 45 ms when forcing switching at each request. It seems that a large part of that would be due to the switching. Not sure yet where the remaining 10 ms comes from, but it seems the bulk is really down to switching profile.

It would be interesting to know why the difference is only a fourth compared to the slowdown experienced by @eimrek . A 50 ms slowdown would be a lot more acceptable than 200 ms clearly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants