Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database organization #34

Open
LuiggiTenorioK opened this issue Oct 24, 2023 · 10 comments
Open

Database organization #34

LuiggiTenorioK opened this issue Oct 24, 2023 · 10 comments
Assignees
Labels
working on Someone is working on it

Comments

@LuiggiTenorioK
Copy link
Member

I realized that, in order to make the API portable, We need to map which tables are granted to exist and be maintained for every supported version of autosubmit (maybe >= 3.13?).

Going back to issue #22, we have a complete picture of all the tables and databases that exist in our production environment. Unfortunately, it seems that some of them were created or modified manually by a script to make some features work. This will lead to potential bugs when deploying the API to new environments as some data may be missing or deprecated for different versions of autosubmit.

I think that a good work path is to map what tables are from the autosubmit package (the versions we are going to support) in a clear way, and then define what is the minimal amount of tables that the API needs to extend to work properly. In this way, we can also deprecate the as_times.db and have one central db.

This work may lead to a huge refactoring of the code (maybe making it feasible to write most of the API again), but I think it will take less time than solving the further issues generated by not doing it. Also, that in the way we can solve other issues like API-side data validation.

New DDBB documentation Google Doc 📎

@kinow @dbeltrankyl @mcastril

@LuiggiTenorioK
Copy link
Member Author

In GitLab by @mcastril on Oct 24, 2023, 13:29

Thank you for creating the issue @LuiggiTenorioK. I can delete it now from my to-do list.

Hopefully we won't have to write most of the API again, I hope so.

As discussed today, before we had the distributed DDBB scheme (DDBBs for every experiment) Autosubmit API only relied on the central as_times.db (mostly filled in by the API workers). This DDBB started to grow too fast and we moved to a decentralized way, (mostly filled in by Autosubmit).

For that reason, many API operations may use as_times.db as a fallback in case the experiment DDBB doesn't exist, to keep compatibility with older Autosubmit versions.

In Autosubmit4, we wanted to remove legacy tables as experiment_backup and maybe the as_times.db if we can only rely on the distributed DDBBs. We would use a completely new DDBB and disk space when we deploy Autosubmit4 in production in the department, and we have flexibility about this.

Related issue: https://earth.bsc.es/gitlab/es/autosubmit/-/issues/858

However, it's important to keep compatibility with AS3 by now.

@LuiggiTenorioK
Copy link
Member Author

In GitLab by @mcastril on Oct 24, 2023, 13:32

The DDBBs documentation, which didn't make it into de wiki yet because I think that it still needs more information could give you some hints and could also benefit a lot from your analysis. We can use it as a baseline to have complete documentation. As you say, we must know, for every DDBBs, who is the writers and consumers, and if possible from which version (but I understand the difficulty of this).

It's also important to realize the status of the current DDBBs. Which experiments have corrupted instances and so. The fix can come by the dbfix command that Autosubmit runs itself.

Related issue: https://earth.bsc.es/gitlab/es/autosubmit/-/issues/1018

@LuiggiTenorioK
Copy link
Member Author

@LuiggiTenorioK
Copy link
Member Author

I also hope that we don't have to write the API again. But, I'll keep in mind which tables the API is responsible for, and design a code architecture that layers the API logic from the database structure. Probably not by changing everything at once but mostly for new features and modifications. That's why is at least required to separate these data responsibilities accurately among our projects.

@LuiggiTenorioK
Copy link
Member Author

Some notes about the tables:

  • [autosubmit.db].experiment: Created and updated by autosubmit
  • [autosubmit.db].details: Created and updated by the API
  • [as_times.db].experiment_status: Created and inserted data by autosubmit. Update and delete data from the API.
  • [as_times.db].experiment_times: Created and updated by the API
  • [as_times.db].job_times: Created and updated by the API

experiment_status is still used by autosubmit when using autosubmit run <expid> to insert the launched experiment, so then the API shows immediately it in the active experiments list. That means that the as_times.db is not fully deprecated.

@LuiggiTenorioK
Copy link
Member Author

In GitLab by @mcastril on Jan 4, 2024, 10:02

@LuiggiTenorioK it would be quite helpful that you update the DDBB documentation GDoc while you take your notes (with track changes) to get a better understanding of these details within the team

@LuiggiTenorioK
Copy link
Member Author

Yes, I will look into how can we document it effectively because I'm not sure if a GDoc is the best place to do it. Also because the developer guide GDoc was deleted recently (but I did a backup before it was removed).

@LuiggiTenorioK
Copy link
Member Author

In GitLab by @mcastril on Jan 5, 2024, 12:52

Yes, I will look into how can we document it effectively because I'm not sure if a GDoc is the best place to do it.

The GDoc was a medium for drafting purposes, as it enables commenting, tracking changes and online collaboration. The final purpose was to upload it somewhere else. A summary got into the final Autosubmit documentation and we haven't decide yet where to publish the full doc.

In your case, for doing amendments and additions I think GDoc is the best place (using track changes) because it will help us to know about your findings and discuss about them.

Also because the developer guide GDoc was deleted recently (but I did a backup before it was removed).

Thanks for saving it Luiggi. We should also publish it somewhere (maybe rephrasing some sections). I could ask Cristian anyway because I am in contact with him.

@LuiggiTenorioK
Copy link
Member Author

@mcastril @kinow @dbeltrankyl I made a copy of the DDBB Gdoc to update it with the information of the databases to have a better view of the data available there (and to not lose the document as the previous one). Here is the new document: Autosubmit DDBBs

Also, here is the developer guide that I backed up: Autosubmit GUI/API Developers Guideline

@LuiggiTenorioK
Copy link
Member Author

mentioned in issue autosubmit#1210

@LuiggiTenorioK LuiggiTenorioK self-assigned this Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
working on Someone is working on it
Projects
None yet
Development

No branches or pull requests

1 participant