Runner id collisions causes runner kill loop with LocalScaling deterministic runner IDs #1036

matt2e · 2024-03-07T03:14:38Z

Steps to repro

ftl serve --recreate --log-level=DEBUG
ftl deploy examples/go
ftl ps --verbose
ftl kill ___ <-- choose the deployment with a lower runner id, which is the time one
At this point there will be a remaining runner with an id like R00000000000000000000004000
ftl deploy examples/go/time
LocalScaling then creates a new runner, but chooses the id to be R00000000000000000000004000 (it's generated using the current number of runners)
The leads to the existing runner with that id to be killed to allow this one to start up. Which then causes another runner to be kicked off to redeploy the echo deployment, and the cycle starts to repeat

The text was updated successfully, but these errors were encountered:

Adds `--idle-runners` arg to define how large the idle pool should be. Fixes #1036 Fixes #1030 ### Previous notes Currently a draft because this PR makes #1036 more likely to be hit. Before this change, killing all deployments would mean there are 0 runners, leading to no runner id collisions when you bring up more deployments After this change, killing all deployments means that there will still be runners which will cause collisions if the idle runner ids aren't the lowest possible [`R00000000000000000000002000`, `R00000000000000000000004000` ... ] I've been testing with a hacky fix replacing line `bankend/controller/scaling/local_scaling.go:96` to: ``` binary.BigEndian.PutUint32(ulid[10:], rand.Uint32()) ```

github-actions bot added the triage Issue needs triaging label Mar 7, 2024

alecthomas mentioned this issue Mar 7, 2024

Dashboard #728

Open

alecthomas changed the title ~~Runner id collisions causes runner kill loop~~ Runner id collisions causes runner kill loop with LocalScaling deterministic runner IDs Mar 7, 2024

matt2e mentioned this issue Mar 7, 2024

feat: maintain idle pool of runners #1038

Merged

matt2e self-assigned this Mar 7, 2024

github-actions bot removed the triage Issue needs triaging label Mar 7, 2024

matt2e closed this as completed in #1038 Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runner id collisions causes runner kill loop with LocalScaling deterministic runner IDs #1036

Runner id collisions causes runner kill loop with LocalScaling deterministic runner IDs #1036

matt2e commented Mar 7, 2024 •

edited

Loading

Runner id collisions causes runner kill loop with LocalScaling deterministic runner IDs #1036

Runner id collisions causes runner kill loop with LocalScaling deterministic runner IDs #1036

Comments

matt2e commented Mar 7, 2024 • edited Loading

matt2e commented Mar 7, 2024 •

edited

Loading