Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Improve autopilot liveness check #2090

Closed
2 tasks
fleupold opened this issue Nov 28, 2023 · 0 comments · Fixed by #2236
Closed
2 tasks

chore: Improve autopilot liveness check #2090

fleupold opened this issue Nov 28, 2023 · 0 comments · Fixed by #2236
Assignees
Labels
E:3.1 Driver Colocation See https://github.com/cowprotocol/pm/issues/14 for details good first issue Good for newcomers

Comments

@fleupold
Copy link
Contributor

Background

We currently consider the autopilot to be alive as long as it's able to produce sufficiently recent auctions in the solvable orders cache:

#[async_trait::async_trait]
impl LivenessChecking for Liveness {
async fn is_alive(&self) -> bool {
let age = self.solvable_orders_cache.last_update_time().elapsed();
age <= self.max_auction_age
}
}

However, this approach doesn't work in shadow mode (where we don't populate a solvable orders cache) and may miss the autopilot getting stuck inside a single run loop. This is since the cache is updated on a separate thread, which may still be doing fine while the system is still not propagating any auctions to solvers.

Details

The Runloop should populate a thread-safe "last auction processed" timestamp that can be shared with the Liveness struct. liveness should then be considered healthy if the timestamp is within a reasonable range from now (e.g. 3 minutes).

Note, that this behaviour subsumes the current check as we only kick-off a new auction if the solvable orders cache has update (cf. code).

Acceptance criteria

  • Liveness checks are based on the last timestamp an auction runloop has successfully completed.
  • The same liveness implementation is used across shadow and regular autopilot mode
@fleupold fleupold added good first issue Good for newcomers E:3.1 Driver Colocation See https://github.com/cowprotocol/pm/issues/14 for details labels Nov 28, 2023
@squadgazzz squadgazzz self-assigned this Dec 26, 2023
MartinquaXD pushed a commit that referenced this issue Jan 15, 2024
# Description
This PR creates a shared liveness implementation between shadow and
regular autopilot mode. Both now populate a thread-safe last auction
timestamp whenever an auction has processed. The liveness check compares
the elapsed time since that recorded timestamp with the maximum auction
age.
<!-- List of detailed changes (how the change is accomplished) -->
# Changes
- [x] Liveness checks are based on the last timestamp an auction runloop
has successfully completed.
- [x] The same liveness implementation is used across shadow and regular
autopilot mode. Regular autopilot no longer uses the last update time in
the solvable orders cache.

## How to test
This can be tested manually by running the autopilot locally and
checking http://localhost:9589/liveness. It responds with 200 if the
autopilot is considered alive, 503 otherwise. Max auction age can also
be tweaked using the --max-auction-age argument when running the
autopilot.

## Related Issues

- Fixes  #2090
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E:3.1 Driver Colocation See https://github.com/cowprotocol/pm/issues/14 for details good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants