-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: node gateway error handling improvements #124
Conversation
@@ -94,7 +94,7 @@ var ( | |||
"Batches are deconstructed to single JSON RPC requests for this metric.", | |||
}, | |||
// jsonrpc_method is "batch" for batch requests | |||
[]string{"chain_name", "client", "upstream_id", "url", "jsonrpc_method", "response_code", "jsonrpc_error_code"}, | |||
[]string{"chain_name", "client", "upstream_id", "url", "jsonrpc_method", "jsonrpc_error_code"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't feel HTTP code on a JSON RPC metric was necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed!
internal/route/request_executor.go
Outdated
) | ||
|
||
originFunc := func() (*jsonrpc.SingleResponseBody, error) { | ||
var err error | ||
|
||
// Any errors will result in respBody and resp being nil. | ||
respBody, resp, err = r.getResponseBody(httpReq, &requestBody, configToRoute) //nolint:bodyclose // linter bug | ||
jsonRPCRespBody, httpResp, err = r.getResponseBody(httpReq, &requestBody, configToRoute) //nolint:bodyclose // linter bug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lil bit of renaming to jsonRPCRespBody and httpResp in a few places. Thought this naming would be more clear.
internal/route/request_executor.go
Outdated
func (r *RequestExecutor) routeToConfig( | ||
ctx context.Context, | ||
requestBody jsonrpc.RequestBody, | ||
configToRoute *config.UpstreamConfig, | ||
) (jsonrpc.ResponseBody, *http.Response, bool, error) { | ||
) (jsonrpc.ResponseBody, *HttpResponse, bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't think it was necessary to plumb the full *http.Response
object all the way back to the front. We can close
it and pass back some specific pieces of metadata we're interested in (e.g. httpStatusCode).
The impetus for this change was that I was getting confused when I was first looking at the code (I haven't looked at it in months!).
respBodyString := util.ReadAndCopyBackResponseBody(resp) | ||
httpResp := &HttpResponse{resp.StatusCode} | ||
|
||
if resp.StatusCode >= http.StatusBadRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We weren't doing a good job of handling HTTP responses w/ 4xx and 5xxs in them.
We were trying to deserialize their bodies and failing (most of the time, unless body was empty).
This caused our error metrics to have the wrong HTTP response code attached to them. Ex: I saw a 429 response in logs but it was a 500 in metrics.
HTTP responses with non 2xx//3xx codes should count as errors IMO.
|
||
if err != nil { | ||
r.logger.Warn("Could not deserialize response.", zap.Any("request", requestBody), | ||
zap.String("upstreamID", configToRoute.ID), zap.String("response", fmt.Sprintf("%v", resp)), zap.Error(err)) | ||
return nil, nil, &OriginError{err} | ||
return nil, httpResp, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we see DecodeError
from responses, we should return the response to the customer even though we can't deserialize it.
Passing the DecodeError
back all the way back to the front so that we can do that.
As far as I'm aware, this isn't affecting anything right now. Just thought it was something nice to fix while I'm meddling around here.
internal/route/router.go
Outdated
@@ -155,7 +142,7 @@ func (r *SimpleRouter) Route( | |||
strconv.FormatBool(cached), | |||
).Inc() | |||
|
|||
if err != nil { | |||
if err != nil || httpResponse.StatusCode > http.StatusBadRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice find, do we want 400 as well? aka >=
vs >
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
6ecb206
to
e5325f9
Compare
e5325f9
to
6c05817
Compare
6c05817
to
3104450
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good overall, just some small questions
@@ -94,7 +94,7 @@ var ( | |||
"Batches are deconstructed to single JSON RPC requests for this metric.", | |||
}, | |||
// jsonrpc_method is "batch" for batch requests | |||
[]string{"chain_name", "client", "upstream_id", "url", "jsonrpc_method", "response_code", "jsonrpc_error_code"}, | |||
[]string{"chain_name", "client", "upstream_id", "url", "jsonrpc_method", "jsonrpc_error_code"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed!
internal/route/router.go
Outdated
@@ -155,7 +142,7 @@ func (r *SimpleRouter) Route( | |||
strconv.FormatBool(cached), | |||
).Inc() | |||
|
|||
if err != nil { | |||
if err != nil || httpResponse.StatusCode > http.StatusBadRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice find, do we want 400 as well? aka >=
vs >
?
type HTTPResponse struct { | ||
StatusCode int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious, what was the reason behind using this type instead of http.Response
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sorry, i missed your comment : #124 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intuition says that the response body, aka HTTP.Response.Body
, should be read and discarded (body.close()
) as soon as possible. Otherwise, the connection is held until body.close()
is called.
And since http.Response.body
is a field on http.Response
, it seems to appropriate to apply the "read and discard ASAP" model to the entire object. If we need some data from the object, it's best to read it out into an user-defined structure.
Besides that, the body
is a read-once object. How will other functions know if it has been read yet?
// Still pass the response to client if we're not able to decode response from upstream. | ||
case *jsonrpc.DecodeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder how this changes the behavior on the indexers. i'm assuming current decode errors get swallowed up into a generic 5XX or something, and the indexer will retry on the request.
once we add this change the indexer will potentially receive a poorly formatted JSON response. hopefully it gracefully handles this scenario and retries. then again this case is probably pretty rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not totally sure of the consequences atm, but you're right, it's probably rare.
da8fcaa
to
15bb16d
Compare
|
||
var body *SingleRequestBody | ||
|
||
func DecodeRequestBody(requestBodyRawBytes []byte) (RequestBody, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the io.ReadAll
call out of this function because errors reading data out of the stream were being conflated with real decoding errors.
Example from logs: "error":"decode error: http2: server sent GOAWAY and closed the connection; LastStreamID=19999, ErrCode=NO_ERROR, debug=\"\",
This is an error from io.ReadAll
. It's an error on the connection level, not with the contents of the response.
respBodyBytes, err := util.ReadAndCopyBackResponseBody(resp) | ||
respBodyString := string(respBodyBytes) | ||
|
||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we're handling connection-level errors like
"error":"decode error: http2: server sent GOAWAY and closed the connection; LastStreamID=19999, ErrCode=NO_ERROR, debug=\"\",
df53485
to
46451c0
Compare
46451c0
to
b3d6480
Compare
Description
This task to essentially fix the
response_code
tag onnode_gateway_router_upstream_rpc_request_errors.count
ended up just a tad bit larger. I ended up making some improvements to how we handle errors, and some small misc things.Type of change
How Has This Been Tested?
Testing local and staging before prod.