Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPIC-2: Upgrades and Scale Handling #67

Open
4 tasks
OmerMajNition opened this issue Oct 17, 2024 · 0 comments
Open
4 tasks

EPIC-2: Upgrades and Scale Handling #67

OmerMajNition opened this issue Oct 17, 2024 · 0 comments

Comments

@OmerMajNition
Copy link

OmerMajNition commented Oct 17, 2024

After setting up topology, users may deploy it as a single service or as a group of federates. Once deployed as a production code, users should have control over the deployed reactor components. If users want to reset a reactor component or make changes to reactor component’s local state variables there should be a mechanism in place. This way users would have total control of their deployment. Leveraging this control users could also handle upgrades of reactor components and scale the topology on the go, adding levels of caches inside a server, or increasing servers inside a POP. One very common case would be to add a new POP (having a new geo-location) to an already running production deployment.

Note: This would help users steer, upgrade, and scale a production system without disrupting the services. Users could make decisions based on their traffic changes and business dynamics.

Following user stories explain with examples the need for upgrades and scale handling

User Stories:

Upgrades and Scale Handling - Brainstorming

Right time for Upgrades and Topological changes
Once we figure out updates in the topology, graph changes and reaction index calculations, we could update to newer topology any moment. But when to trigger updates?
First natural checkpoint could be when we are done with the current tag (which means all the outputs from reactions are handled).
What to do with the actions already in the queue, should we process them or flush them?

Implementation Considerations:
We could implement different algorithms with different behavior:

  1. Let the user supply a function that mutates events on the event-queue
  2. Delay the mutation until all events on the event-queue have been processed
  3. Delete all events on the event-queue and upgrade immediately

Upgrades and Topological changes, Handling Mechanism

  1. A special “admin” reaction that has a reserved level (like Level 0 for Startup, Level 1 could be reserved for Updates) could be added to the execution queue when a reactor needs an update. Changes to topology change would result in graph change, new calculation of reaction indexes, insert updated reactions to the execution queue and let the system run and update itself. Users may want to add an admin controller which can control updates in topology and the states of reactors through “admin” reactions.
  1. Another concept is to let the admin reactor receive the LF graph at its nest level and the nest level below as a data structure. Then the admin reactor can use any statistics it has gathered and make changes to the graph using a provided API. This would, for example, allow easy arity changes for scale by letting the admin reactor have a ‘handle’ on the reactors whose arity needs to be changed. This needs further brainstorming.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant