You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we use plain django signals. These have a few problems:
The signals are completely best effort. We don't note down which signal receiver actually reacted to the signal and if its action was successful. We don't retry anything either.
Further we block the commit from going through and the http request from returning before the signals are finished. The rational is, we want people to be sure their DNS change is already live before returning their commit as successful. The downside is that we block the django worker for seconds at a time to execute the flush by connecting to places via SSH (our preferred method of RPC for some reason) and that's the success case. In the failure case a hook takes a really long time. We may have to wait until the request timed out at 60 seconds. If we have enough requests in the queue making such a change, all workers will become busy. Subsequent requests will queue up until we reach net.core.somaxconn. Once that happens the health check endpoint will start failing, resulting in this web server getting kicked out of the loadbalancer.
My proposed Solution
To solve this, we should move the signal handler out of the serveradmin django. Signals should have no external dependencies and only check data integrity. Serveradmin already keeps track of commits. When making a commit via the API we should return a handle to this commit. If a client expects a certain hook to be performed, the commit object should offer an API to wait for the hook to be finished.
Further we need to extend the API to allow retrieving commits that have happened. Using this API an external DNS flusher, reusing the code currently living in serveradmin, could notify when it started and finished working on a hook for a commit and even report errors back. This hook information should then be transmitted back to the client that made the commit and is waiting for this hook. Further the hook information could be persisted in the database to have a record of which hooks were performed.
Another upside to this design is, it allows for very long running hooks. Even building a VM would be a perfectly plausible hook to implement this way.
TODO:
I'm deliberately vague about the API here, especially on the side working an the hook. The simplest implementation would offer all the commits that are made on serveradmin, but we could also be smarter. Emre already proposed[0] extending the adminapi Query API to subscribe to changes made on serveradmin after the initial retrieval of a Query. That proposal would complement this proposal well.
Django signals only happen within one web server. We have multiple web servers though. If somebody commits a change on web server A, but I'm subscribed via web server B we wouldn't know about it. Hence we either need to constantly poll B to requery the database or we need to inform all web servers about a commit that just happened. I suggest looking into postgresqls NOTIFY/LISTEN features for this. There's even already a django plugin for it.
The Problem
Currently we use plain django signals. These have a few problems:
The signals are completely best effort. We don't note down which signal receiver actually reacted to the signal and if its action was successful. We don't retry anything either.
Further we block the commit from going through and the http request from returning before the signals are finished. The rational is, we want people to be sure their DNS change is already live before returning their commit as successful. The downside is that we block the django worker for seconds at a time to execute the flush by connecting to places via SSH (our preferred method of RPC for some reason) and that's the success case. In the failure case a hook takes a really long time. We may have to wait until the request timed out at 60 seconds. If we have enough requests in the queue making such a change, all workers will become busy. Subsequent requests will queue up until we reach net.core.somaxconn. Once that happens the health check endpoint will start failing, resulting in this web server getting kicked out of the loadbalancer.
My proposed Solution
To solve this, we should move the signal handler out of the serveradmin django. Signals should have no external dependencies and only check data integrity. Serveradmin already keeps track of commits. When making a commit via the API we should return a handle to this commit. If a client expects a certain hook to be performed, the commit object should offer an API to wait for the hook to be finished.
Further we need to extend the API to allow retrieving commits that have happened. Using this API an external DNS flusher, reusing the code currently living in serveradmin, could notify when it started and finished working on a hook for a commit and even report errors back. This hook information should then be transmitted back to the client that made the commit and is waiting for this hook. Further the hook information could be persisted in the database to have a record of which hooks were performed.
Another upside to this design is, it allows for very long running hooks. Even building a VM would be a perfectly plausible hook to implement this way.
TODO:
[0] https://github.com/innogames/serveradmin/pull/52
The text was updated successfully, but these errors were encountered: