Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

close django connection in REST API #4741

Closed
wants to merge 4 commits into from

Conversation

ltalirz
Copy link
Member

@ltalirz ltalirz commented Feb 10, 2021

Note: This is a draft meant for testing e.g. on Materials Cloud.

In regular operation, AiiDA keeps several connections to the postgresql database persistently open.
One set of connections (up to 5 by default) comes from the SQLA connection pool, which is present on both sqla and django profiles (since both use the QueryBuilder).
For django profiles, there is an additional persistent connection opened by django.
See here for a short summary.

The default connection limit of postgresql is 100; see here for why it is a good idea to keep the number of connections from exploding.
As long as AiiDA users don't use more than 100/6=16 daemon runners (that number would need to include AiiDA shells + REST APIs), they will not hit that limit in regular use.

For Materials Cloud, the situation is different, however.
Currently, Materials Cloud runs one independent AiiDA REST API process per profile, all of which connect to the same postgresql database, and some profiles include further AiiDA-based services that also make use of the REST API.
We've therefore hit the limit of 100 connections (and even more) some time ago.

The Materials Cloud use case is very different from "operational" AiiDA use: on Materials Cloud, database access is needed during user-initiated queries but there are long periods of inactivity between those (there are no running jobs that would access the database with high frequency).
The benefit from keeping postgresql connections open is therefore less relevant (as shown here, opening a postgresql connection on materials cloud takes ~6-10ms).

This PR

  • Replaces the default QueuePool of sqla by a NullPool that will release a connection as soon as it is no longer needed
  • For django profiles: Closes the persistent Django connection at the end of every REST API request

I've tested that even storing ORM entities still works inside the REST
API, i.e. if needed the django connection is simply reestablished on the
fly.

The REST API has no need for the django connection, since it is only
using the QueryBuilder so far.

I've tested that even storing ORM entities still works inside the REST
API, i.e. if needed the django connection is simply reestablished on the
fly.
@ltalirz ltalirz force-pushed the connection-tests-dirty branch from 3a7cddd to 81edfcf Compare February 10, 2021 21:54
@ltalirz ltalirz force-pushed the connection-tests-dirty branch from 331bf1b to 74fb15a Compare February 10, 2021 22:50
@elsapassaro
Copy link
Contributor

@ltalirz do you think that this PR could be merged? I'm setting up the new aiida servers for Materials Cloud with aiida-core 1.6.5 and I'm getting again apache errors due to a limited number of connection slots. For now I'll cherry-pick these commits but it would be great to have these changes in the next AiiDA release

@ltalirz
Copy link
Member Author

ltalirz commented Sep 3, 2021

thanks for the note @elsapassaro - these changes were a quick hack meant for trying things out but if they help for materials cloud perhaps we can figure out a way to get them in.

I've just updated the description of the PR to explain the reasoning here in some detail.

The reason these changes should not be merged as-is is that they can affect regular AiiDA operation:

  1. Using a NullPool in sqla can slow down queries (one should check but I am confident it does - otherwise we should indeed use the nullpool)
  2. Similarly, on django profiles if someone makes a request to the REST API, the connection will be closed (it is reestablished automatically when needed, but again this can lead to slowdown)

I propose to add a verdi config option that allows users to tell AiiDA to minimize persistent connections (disabled by default).
If it is enabled, the changes made here would go into effect.

What is your perspective @sphuber ?

@sphuber
Copy link
Contributor

sphuber commented Sep 3, 2021

If this change can be made without affecting normal use of AiiDA, by making it configurable (preferably through the REST API config, instead of the global one, if this is purely for the REST API) then we could add this. It might be worth having a look at #5054 as a potential alternative though. If I understand correctly, the reason of the problems is that you are using a single REST API process per profile and so are running out of connections, but you are only doing that since it is currently not possible to switch profiles. PR #5054 adds that functionality and so would remove your original problem therefore not needing this workaround. @elsapassaro there are already other people testing that branch and if that works we will try to include it with the next release. If you could also test this with the MC that would help speed up the process.

@ltalirz
Copy link
Member Author

ltalirz commented Sep 3, 2021

making it configurable (preferably through the REST API config, instead of the global one, if this is purely for the REST API)

I guess it would in principle be possible to limit the scope to the REST API but I believe it would require some changes since the explicit creation of the postgresql connections currently does not happen inside the REST API code.
Furthermore, there are Materials Cloud discover sections that use AiiDA for querying the database via the python interface rather than through the REST API (similar to how the AiiDA lab works).
I think we want these changes to apply globally (of course in a configurable way).

PR #5054 adds that functionality and so would remove your original problem therefore not needing this workaround.

Fully agree that if #5054 turns out to be performant enough that would be great and would remove the need for this PR.

In this PR (#4741) we just talking about opening/closing a postgresql connection per request (<10ms as mentioned above); in #5054 we are talking about potentially switching profiles per request which is likely orders of magnitude slower.
The only way to figure it out is to test

@elsapassaro Since testing #5054 requires a migration to the new file repository, which will take a long time for MC, I think we should start testing performance locally.
If I find time next week, I will test it locally (maybe after #5116 is addressed) and I suggest you give it a try locally with a small database.

Furthermore, it would be super useful to have a look at the request patterns to the AiiDA REST APIs on Materials Cloud.
In essence what we want to know is: if we combine the logs from all REST API daemons into one time-ordered log:

  • how long do requests typically take?
  • what is the rate of requests?
  • what would be the rate of profile switching?

@sphuber
Copy link
Contributor

sphuber commented Jan 27, 2022

Since we are getting rid of Django for v2.0, I am closing this for now. If you feel this needs a backport fix on 1.6.x please reopen and rebase on support/1.6.x.

@sphuber sphuber closed this Jan 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants