Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: latency checker #131

Merged
merged 19 commits into from
Sep 25, 2024

Conversation

polsar88
Copy link
Contributor

@polsar88 polsar88 commented Aug 30, 2024

Design document: https://www.notion.so/alchemotion/Error-and-latency-based-routing-in-the-Satsuma-node-gateway-3a60d896c5844f82bf90f68570b00608

This PR adds a new module containing the new ErrorLatencyCheck struct. It contains the core functionality to keep error and latency state for each upstream, and to decide whether an upstream is healthy based on the request history for each RPC method.

Type of change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • 😎 New feature (non-breaking change which adds functionality)
  • ⁉️ Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • ⚒️ Refactor (no functional changes)
  • 📖 Documentation (updating or adding docs)

@polsar88 polsar88 requested a review from brianluong August 30, 2024 23:49
brianluong

This comment was marked as resolved.

@polsar88 polsar88 requested a review from brianluong September 5, 2024 02:44
brianluong

This comment was marked as resolved.

@polsar88 polsar88 requested a review from pavelm September 16, 2024 16:45
polsar88 and others added 8 commits September 21, 2024 20:15
* feat: latency checker wiring

* fix: remove ShouldRun check in IsLatencyAcceptable.Apply method

* feat: support `alwaysRoute` option (#133)

* feat: support `alwaysRoute` option

* feat: add logging

* fix: add TODOs

* fix: updated comments

* fix: remove confusing "TODOs"

* fix: better phrasing

* fix: lint

* fix: add panics

* fix: receiver cannot be nil

* feat: check if error/latency routing control is enabled

* fix: improve readability

* fix: improve readability

* fix: remove a TODO

* fix: improve code readability

* fix: improve code readability

* fix: add a comment for `GlobalConfig::setDefaults` method

* fix: rename IsEnhancedRoutingControlEnabled method

* fix: rename HasEnhancedRoutingControlDefined method
@polsar88 polsar88 merged commit 177c529 into polsar/config-parsing-enhancements Sep 25, 2024
5 checks passed
@polsar88 polsar88 deleted the polsar/latency-checker branch September 25, 2024 21:54
polsar88 added a commit that referenced this pull request Sep 25, 2024
* feat: config parsing enhancements

* fix: the default threshold is now set in LatencyConfig if not specified

* fix: getLatencyThresholdForMethod

* fix: rename constant

* fix: add TODOs

* feat: expand a test

* fix: chain routing config is no longer initialized if global routing config is not specified

* feat: latency checker (#131)

* feat: latency checker

* fix: rename struct field

* fix: ShouldRunPassiveHealthChecks field is now set correctly

* feat: add method as a Prometheus label

* fix: remove extra RecordRequest call

* fix: rename method

* fix: check if we should run passive check before initializing the client

* fix: rename LatencyCheck to ErrorLatencyCheck

* fix: rename `latency` module to `errorlatency`

* fix: passive health check is no longer initialized if it is disabled

* fix: improve a metric's help text

* fix: rename metric variable names

* fix: typo

* feat: add tests

* feat: latency checker wiring (#132)

* feat: latency checker wiring

* fix: remove ShouldRun check in IsLatencyAcceptable.Apply method

* feat: support `alwaysRoute` option (#133)

* feat: support `alwaysRoute` option

* feat: add logging

* fix: add TODOs

* fix: updated comments

* fix: remove confusing "TODOs"

* fix: better phrasing

* fix: lint

* fix: add panics

* fix: receiver cannot be nil

* feat: check if error/latency routing control is enabled

* fix: improve readability

* fix: improve readability

* fix: remove a TODO

* fix: improve code readability

* fix: improve code readability

* fix: add a comment for `GlobalConfig::setDefaults` method

* fix: rename IsEnhancedRoutingControlEnabled method

* fix: rename HasEnhancedRoutingControlDefined method
polsar88 added a commit that referenced this pull request Sep 25, 2024
* feat: add config parameters for fine-grained routing control (#126)

* Added instructions to `README.md`

* feat: update dependencies

* feat: updated dependencies

* feat: new mockery version

* feat: add routing config params

* feat: upgrade Go version

* fix: suppress legacy linter errors

* fix: field alignments in structs

* fix: field linter errors

* fix: linter errors

* feat: add routing config params

* feat: validate routing error rate

* feat: refactoring

* feat: add a TODO

* feat: `ErrorsConfig` and `LatencyConfig` are now pointers

* feat: add a field to routing config test

* feat: set defaults for `DetectionWindow` and `BanWindow`

* feat: move RoutingConfig functions closer to the struct

* feat: changed `or` to `and`

* feat: split function into validation and setting defaults components

* feat: settings the config defaults now mirrors the validation workflow

* feat: remove redundant call

* feat: upgrade Go version to 1.22

* feat: config parsing enhancements (#130)

* feat: config parsing enhancements

* fix: the default threshold is now set in LatencyConfig if not specified

* fix: getLatencyThresholdForMethod

* fix: rename constant

* fix: add TODOs

* feat: expand a test

* fix: chain routing config is no longer initialized if global routing config is not specified

* feat: latency checker (#131)

* feat: latency checker

* fix: rename struct field

* fix: ShouldRunPassiveHealthChecks field is now set correctly

* feat: add method as a Prometheus label

* fix: remove extra RecordRequest call

* fix: rename method

* fix: check if we should run passive check before initializing the client

* fix: rename LatencyCheck to ErrorLatencyCheck

* fix: rename `latency` module to `errorlatency`

* fix: passive health check is no longer initialized if it is disabled

* fix: improve a metric's help text

* fix: rename metric variable names

* fix: typo

* feat: add tests

* feat: latency checker wiring (#132)

* feat: latency checker wiring

* fix: remove ShouldRun check in IsLatencyAcceptable.Apply method

* feat: support `alwaysRoute` option (#133)

* feat: support `alwaysRoute` option

* feat: add logging

* fix: add TODOs

* fix: updated comments

* fix: remove confusing "TODOs"

* fix: better phrasing

* fix: lint

* fix: add panics

* fix: receiver cannot be nil

* feat: check if error/latency routing control is enabled

* fix: improve readability

* fix: improve readability

* fix: remove a TODO

* fix: improve code readability

* fix: improve code readability

* fix: add a comment for `GlobalConfig::setDefaults` method

* fix: rename IsEnhancedRoutingControlEnabled method

* fix: rename HasEnhancedRoutingControlDefined method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants