EPIC-2: Upgrades and Scale Handling #67

OmerMajNition · 2024-10-17T11:19:31Z

After setting up topology, users may deploy it as a single service or as a group of federates. Once deployed as a production code, users should have control over the deployed reactor components. If users want to reset a reactor component or make changes to reactor component’s local state variables there should be a mechanism in place. This way users would have total control of their deployment. Leveraging this control users could also handle upgrades of reactor components and scale the topology on the go, adding levels of caches inside a server, or increasing servers inside a POP. One very common case would be to add a new POP (having a new geo-location) to an already running production deployment.

Note: This would help users steer, upgrade, and scale a production system without disrupting the services. Users could make decisions based on their traffic changes and business dynamics.

Following user stories explain with examples the need for upgrades and scale handling

User Stories:

Upgrades and Scale Handling - Brainstorming

Right time for Upgrades and Topological changes
Once we figure out updates in the topology, graph changes and reaction index calculations, we could update to newer topology any moment. But when to trigger updates?
First natural checkpoint could be when we are done with the current tag (which means all the outputs from reactions are handled).
What to do with the actions already in the queue, should we process them or flush them?

Implementation Considerations:
We could implement different algorithms with different behavior:

Let the user supply a function that mutates events on the event-queue

Delay the mutation until all events on the event-queue have been processed

Delete all events on the event-queue and upgrade immediately

Upgrades and Topological changes, Handling Mechanism

A special “admin” reaction that has a reserved level (like Level 0 for Startup, Level 1 could be reserved for Updates) could be added to the execution queue when a reactor needs an update. Changes to topology change would result in graph change, new calculation of reaction indexes, insert updated reactions to the execution queue and let the system run and update itself. Users may want to add an admin controller which can control updates in topology and the states of reactors through “admin” reactions.

Another concept is to let the admin reactor receive the LF graph at its nest level and the nest level below as a data structure. Then the admin reactor can use any statistics it has gathered and make changes to the graph using a provided API. This would, for example, allow easy arity changes for scale by letting the admin reactor have a ‘handle’ on the reactors whose arity needs to be changed. This needs further brainstorming.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPIC-2: Upgrades and Scale Handling #67

EPIC-2: Upgrades and Scale Handling #67

OmerMajNition commented Oct 17, 2024 •

edited

Loading

Upgrades and Scale Handling - Brainstorming

EPIC-2: Upgrades and Scale Handling #67

EPIC-2: Upgrades and Scale Handling #67

Comments

OmerMajNition commented Oct 17, 2024 • edited Loading

Upgrades and Scale Handling - Brainstorming

OmerMajNition commented Oct 17, 2024 •

edited

Loading