-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only get the latest 100,000 scenarios from db-bot #1359
Comments
Good that you identified this issue @noracato. My first preference would go to the light dump option. This however depends on our use of the db-bot dump. If we do not actually need all those scenarios then it does not make sense to download them. Perhaps there are other filters we can apply that would trim the download? Since we're having a discussion about |
My main question @noracato is what the use is of the dumps from db-bot. Who uses them and why? |
Hope you don't mind my answering/chiming in: I think the usecases are diverse and numerous. For example, I used the etengine production dump to do the benchmark testing. It was very useful to have them! I think that when downloading such a dump from the production db you usually have such a usecase in mind: to inspect how the database is performing in various manners, but also to see how the database is used; how many inputs/sliders do people set on average, how many (custom) curves, etc. As a developer it is much easier to inspect and test such things on a local database. You don't want to do that on a live production database because it puts unnecessary extra load on the server, and even more importantly, it can be dangerous to data integrity and server security. Personally, I would be in favor of the 'keep current dump and add light dump' option. |
Thanks for the explanation @thomas-qah. In that case I think my preference would be to set the default to 1 month old scenarios, but to allow users (meaning ourselves) to specify a different time limit.
Does this seem feasible for you? |
Sure. That means we have to save a few different dumps each night (0, 1, 2 and 3) - right @thomas-qah? Which increases the time the server will be very busy and will increase our bill with amazon a bit. If we would attract more users from different time zones, it could become a problem in the future. As they would be using the model at the times it will be busy with creating the backups. Not sure how much of a problem it actually is, but just putting it down here! |
Yes what @noracato writes is correct, but we could of course create a schedule for when each dump gets created. For example:
I think this would decrease the server load significantly, also compared to now :) |
This issue has had no activity for 60 days and will be closed in 7 days. Removing the "Stale" label or posting a comment will prevent it from being closed automatically. You can also add the "Pinned" label to ensure it isn't marked as stale in the future. |
When downloading and importing a copy from the anonomized database dumps from db-bot you can easily spend up to 10 minutes just waiting around.
This slow expansion of waiting time happened because over the years the engine's database has grown bigger and bigger. As we only use these database dumps for development and help questions from users, we can opt for dumping not the full database, but only the last hunderd thousand scenarios or so (including the special scenarios like II3050). Then the download (currently 3.3GB) and the import will be much quicker.
Right now the limit is scenarios younger than 3 months and scenarios that are marked with
keep_compatible
. But that seems like it's not enough.We can also maintain the current dump and add the light dump as an option to download from db-bot.
What do you think @mabijkerk and @thomas-qah?
The text was updated successfully, but these errors were encountered: