REST API deployment: reconsider use of Process / Application groups #4494

ltalirz · 2020-10-20T21:43:44Z

The current deployment scheme for AiiDA REST APIs on Materials Cloud is to use one WSGI daemon and one process group per AiiDA profile:

...
    WSGIDaemonProcess rest-myprofile \
        user=ubuntu group=ubuntu \
        threads=5 \
        python-home=/home/ubuntu/.virtualenvs/aiida \
        display-name=aiida-rest-myprofile

    # REST API will be served on <host>/myprofile/api/v4
    WSGIScriptAlias /myprofile /home/ubuntu/wsgi/myprofile-rest.wsgi
    <Location /myprofile>
        WSGIProcessGroup myprofile
    </Location>
...

This results in each WSGI daemon loading the required python modules into memory (resulting in a memory footprint of ~100 MB for every AiiDA profile served).

It is, in principle, possible to define Application groups, which let all wsgi applications within the same group share the same python interpreter.

I believe this won't work with how things currently work inside AiiDA, since this would require profile switching within the interpreter. However, once profile switching has been implemented, it could be an alternative approach to the solution suggested here, which suggested to support profile switching within the REST API application itself.
The latter solution would likely involve some API change (e.g. a profile prefix in the URL path), but would also be more flexible.

The text was updated successfully, but these errors were encountered:

ltalirz · 2021-07-28T08:52:51Z

also related #4374 (comment)

ltalirz · 2022-07-15T11:41:17Z

Now that profile switching via REST API is implemented, this question has resurfaced.

According to benchmarks by @eimrek in #5054 (comment), the overhead of profile switching is currently at several 100ms.
@sphuber mentions the instantiation of the sqlalchemy connection and engine alone takes about 200ms.

Given that establishing a new postgres connection should take <10ms (see #4374 (comment)), it would be interesting to profile this sqla initialization step and understand whether it can be made faster.

I guess this would potentially also benefit the startup time of any AiiDA-related code / shell (?).

sphuber · 2022-07-15T12:44:57Z

I need to add a bit more nuance the statement that the engine connection takes about 200ms. The 200 ms was the difference in average request time in the benchmark performed by @eimrek in #5054 . Since the main difference in code trajectory for requests that have to switch profiles is just unloading the current and loading the database connection of the new profile, I presumed that the majority of that 200 ms can probably be attributed to those actions. However, I cannot be sure as I don't know how long the unload/load profile cycle takes on the machine used by those benchmarks.

To get a sense, I ran some benchmarks on my workstation where I am running and accessing a REST API on localhost.

I ran a benchmark isolating just the part of unloading the current profile, loading the new one and then loading the storage backend, which in the case of psql_dos should instantiate a connection to the database and perform a check. This was done using the following function

def change_profile(profile_a, profile_b):
    from aiida.manage import get_manager
    manager = get_manager()
    manager.load_profile(profile_a, allow_switch=True)
    start = time.time()
    manager.load_profile(profile_b, allow_switch=True)
    manager.get_profile_storage()
    return time.time() - start

Note that we cannot do something like

%timeit manager.load_profile(profile_a, allow_switch=True)

because after the first iteration, the profile will have been switched and the call is essentially almost a no-op.

On my workstation, I got times of roughly 35 ms for switching a profile.

I then ran a benchmark of firing 50 requests to the REST API and recording the request time, doing this for just a single profile (no switching) or by alternating between two profiles, forcing the server to switch profiles at each request (this corresponds to the worst case scenario). These are the timings:

No switching
min 0.013
max 0.081
avg 0.020

With switching
min 0.058
max 0.131
avg 0.065

We can see that the average increases by 45 ms when forcing switching at each request. It seems that a large part of that would be due to the switching. Not sure yet where the remaining 10 ms comes from, but it seems the bulk is really down to switching profile.

It would be interesting to know why the difference is only a fourth compared to the slowdown experienced by @eimrek . A 50 ms slowdown would be a lot more acceptable than 200 ms clearly.

ltalirz added the type/feature request status undecided label Oct 20, 2020

ltalirz mentioned this issue Oct 20, 2020

Docs: update REST API wsgi scripts #4488

Merged

1 task

chrisjsewell added the topic/rest-api label Nov 18, 2020

This was referenced Aug 10, 2021

REST API: make the target profile a query parameter #5052

Closed

REST API: make the profile configurable as request parameter #5054

Merged

sphuber added type/accepted feature approved feature request priority/nice-to-have and removed type/feature request status undecided labels Jul 15, 2022

sphuber self-assigned this Jul 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REST API deployment: reconsider use of Process / Application groups #4494

REST API deployment: reconsider use of Process / Application groups #4494

ltalirz commented Oct 20, 2020 •

edited

Loading

ltalirz commented Jul 28, 2021

ltalirz commented Jul 15, 2022 •

edited

Loading

sphuber commented Jul 15, 2022

REST API deployment: reconsider use of Process / Application groups #4494

REST API deployment: reconsider use of Process / Application groups #4494

Comments

ltalirz commented Oct 20, 2020 • edited Loading

ltalirz commented Jul 28, 2021

ltalirz commented Jul 15, 2022 • edited Loading

sphuber commented Jul 15, 2022

ltalirz commented Oct 20, 2020 •

edited

Loading

ltalirz commented Jul 15, 2022 •

edited

Loading