chore: Improve autopilot liveness check #2090
Labels
E:3.1 Driver Colocation
See https://github.com/cowprotocol/pm/issues/14 for details
good first issue
Good for newcomers
Background
We currently consider the autopilot to be alive as long as it's able to produce sufficiently recent auctions in the solvable orders cache:
services/crates/autopilot/src/run.rs
Lines 70 to 76 in 79c7aac
However, this approach doesn't work in shadow mode (where we don't populate a solvable orders cache) and may miss the autopilot getting stuck inside a single run loop. This is since the cache is updated on a separate thread, which may still be doing fine while the system is still not propagating any auctions to solvers.
Details
The Runloop should populate a thread-safe "last auction processed" timestamp that can be shared with the Liveness struct. liveness should then be considered healthy if the timestamp is within a reasonable range from now (e.g. 3 minutes).
Note, that this behaviour subsumes the current check as we only kick-off a new auction if the solvable orders cache has update (cf. code).
Acceptance criteria
The text was updated successfully, but these errors were encountered: