-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DDBB sync improvements #53
Comments
marked this issue as related to autosubmit#1179 |
mentioned in issue autosubmit#1179 |
In GitLab by @mcastril on Jan 4, 2024, 10:00 Hi @LuiggiTenorioK. This is a great initiative. I think this should be part of the DDBB re-design for AS4 (https://earth.bsc.es/gitlab/es/autosubmit/-/issues/858) that led to nowhere. You could have a very active (an leading) role in this re-design. Then we "only" would have to decide how to keep backward compatibility, especially for the workers. Can you elaborate more on "Then, DDBB can provide more recent data directly without calling the SSOT function (Is assumed that DDBB data is faster to get than calling the SSOT function)." ? |
There are cases where some data have to be read from files or be preprocessed to get the final value. Then, the DDBB acts as a cache of these final values. A case where this is useful is when searching through the experiments. Doing that same process for each experiment can be expensive if there are many. So, in this case, the DDBB is used. The idea of reactively updating these values is that those are updated when we are requesting the information of just one experiment as there is no "many experiments" issue. So, when we have that issue, the data given by the DDBB is as recent as when that experiment was last visited (also could be visited by a worker periodically). |
In GitLab by @mcastril on Jan 5, 2024, 13:27 Thank you for the clarification, it's clearer now |
By addressing issue #49 and following issue #34, I found that the DDBB tables can be improved to give better results for searching experiments.
In this way, some issues have to be handled:
Avoid deletion: First, in the
populate_details
background task, thedetails
table could use aINSERT
orUPDATE
strategy instead of deleting the whole table and populating it from scratch. This is important because, if there are a lot of experiments or this process breaks at some point, this might make data unavailable for searching until the next call to the background task (4 hours). Also, the same strategy must be applied in theexperiment_status
table, because the status registry is deleted after the experiment finishes, instead of being updated.Single Source of Truth (SSOT): Table data must have a Single Source Of Truth for each data concept. Then, I single function for getting a piece of data should be applied and mapped somewhere in the documentation, if possible (having a data catalog might help). For example, there should be just one way to obtain which is the user of the experiment for every endpoint and, with that, update the DDBB. Then, that information can be obtained from the table (as a cached snapshot) and, if needed in real-time, call the same function that was used to update it.
Reactive updating: As explained above, every time an SSOT function is called, data should be updated in the DDBB (if it is feasible and scalable). Then, DDBB can provide more recent data directly without calling the SSOT function (Is assumed that DDBB data is faster to get than calling the SSOT function).
Extend data available in DDBBs: As Autosubmit grows, other data concepts might be included in the
details
table (e.g: wrapper type, job status counters, etc) or in one-to-many additional tables (e.g: metadata as @kinow suggested). This will enrich the search by using only the data available from the DDBB as optimally desired.As a scratch, a way to handle these improvements might be following this:
/v4
) need to call SSOTs or DDBBs and refactor them@mcastril
The text was updated successfully, but these errors were encountered: