-
Notifications
You must be signed in to change notification settings - Fork 653
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: add FAQ/troubleshooting and operation timing guide
- Loading branch information
Showing
3 changed files
with
346 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
--- | ||
title: "Frequently Asked Questions" | ||
linkTitle: "FAQ / Troubleshooting" | ||
description: "Answers to some commonly-asked questions about the {{% alias sdk-go %}}" | ||
weight: 9 | ||
--- | ||
|
||
### How do I configure my SDK's HTTP client? Are there any guidelines or best practices? | ||
|
||
We are unable to provide guidance to customers on how to configure their HTTP | ||
workflow in a manner that is most effective for their particular workload. The | ||
answer to this is the product of a multivariate equation, with input factors | ||
including but not limited to: | ||
|
||
* the network footprint of the application (TPS, throughput, etc.) | ||
* the services being used | ||
* the compute characteristics of the deployment | ||
* the geographical nature of the deployment | ||
* the desired application behavior or needs of the application itself (SLAs, | ||
timings, etc.) | ||
|
||
### How should I configure operation timeouts? | ||
|
||
Much like the previous question, it depends. **The answer to this question, | ||
however, should almost NEVER be based on pure empirical observation** - e.g. "I | ||
made 1000 calls to this operation, it took at most 5 seconds so I will set the | ||
timeout based on that with a safety factor of 2x to 10 seconds". Environment | ||
conditions can change and these types of assumptions can become wrong without | ||
warning, leaving you open to the possibility of self-induced production pain. | ||
|
||
### Requests made by the SDK are timing out or taking too long, how do I fix this? | ||
|
||
We are unable to assist with extended or timed-out operation calls due to | ||
extended HTTP round-trip times. An HTTP round-trip is defined as the time | ||
between the first byte of the request being sent and the last byte of the | ||
response being received. Note that in certain SDKs this can differ from "HTTP | ||
client call time", which is generally the sum of round-trip time and other | ||
bootstrapping operations such as TCP connection management. | ||
|
||
If you are experiencing issues due to operation latency or timeouts, your first | ||
course of action should be to obtain telemetry of the SDK operation lifecycle | ||
to determine the timing breakdown between the HTTP round-trip and the | ||
surrounding overhead of the operation. See the guide on | ||
[timing SDK operations]({{< ref "/docs/faq/timing-operations.md" >}}), | ||
which contains a reusable code snippet that can achieve this. | ||
|
||
### How do I fix a `read: connection reset` error? | ||
|
||
The SDK retries any errors matching the `connection reset` pattern by default. | ||
However, this error can still occur in a context **outside** of the retry loop: | ||
certain service operations directly forward the API's HTTP response body to the | ||
caller to be consumed from the wire directly via `io.ReadCloser` (e.g. | ||
`GetObject`'s object payload). You may encounter this error when performing a | ||
`Read` on the response body. | ||
|
||
This error indicates that your host, the service or any intermediary party | ||
(e.g. NAT gateways, proxies, load balancers) closed the connection while | ||
attempting to read the response. | ||
|
||
This can occur for several reasons: | ||
* You did not consume the response body for time after the response itself was | ||
received (after the service operation was called). **We recommend you consume | ||
the HTTP response body as soon as possible for these types of operations.** | ||
* You did not close a previously-received response body. This can cause | ||
connection resets on certain platforms. **You MUST close any `io.ReadCloser` | ||
instances provided in an operation's response, regardless of whether you | ||
consume its contents.** | ||
|
||
Beyond that, try running a tcpdump for an affected connection at the edge of | ||
your network (e.g. after any proxies that you control). If you see that the AWS | ||
endpoint seems to be sending a TCP RST, you should use the AWS support console | ||
to open a case against the offending service. Be prepared to provide request | ||
IDs and specific timestamps of when the issue occured. | ||
|
||
### Why am I getting "invalid signature" errors when using an HTTP proxy with the SDK? | ||
|
||
The signature algorithm for AWS services (generally sigv4) is tied to the | ||
serialized request's headers, more specifically most headers prefixed with | ||
`X-`. Proxies are prone to modifying the outgoing request by adding additional | ||
forwarding information (often via an `X-Forwarded-For` header) which | ||
effectively breaks the signature that the SDK calculated. | ||
|
||
If you're using an HTTP proxy and experiencing signature errors, you should | ||
work to capture the request **as it appears outgoing from the proxy** and | ||
determine whether it is different. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,247 @@ | ||
--- | ||
title: "Timing SDK operations" | ||
linkTitle: "Timing Operations" | ||
description: "How to perform basic instrumentation in the {{% alias sdk-go %}} to time SDK operations" | ||
weight: 1 | ||
--- | ||
|
||
When debugging timeout / latency issues in the SDK, it is critical to identify | ||
the components of the operation lifecycle which are taking more time to execute | ||
than expected. As a starting point, you will generally need to inspect the | ||
timing breakdown between the overall operation call and the HTTP call itself. | ||
|
||
The following sample program implements a basic instrumentation probe in terms | ||
of `smithy-go` middleware for SQS clients and demonstrates how it is used. The | ||
probe emits the following information for each operation call: | ||
|
||
* AWS request ID | ||
* service ID | ||
* operation name | ||
* operation invocation time | ||
* http call time | ||
|
||
Each emitted message is prefixed with a unique (to a single operation) | ||
"invocation ID" which is set at the beginning of the handler stack. | ||
|
||
The entry point for instrumentation is exposed as `WithOperationTiming`, which | ||
is parameterized to accept a message handling function which will receive | ||
instrumentation "events" as formatted strings. `PrintfMSGHandler` is provided | ||
as a convenience which will simply dump messages to stdout. | ||
|
||
The service used here is interchangeable - ALL service client options accept | ||
`APIOptions` and an `HTTPClient` as configuration. For example, | ||
`WithOperationTiming` could instead be declared as: | ||
|
||
```go | ||
func WithOperationTiming(msgHandler func(string)) func(*s3.Options) | ||
func WithOperationTiming(msgHandler func(string)) func(*dynamodb.Options) | ||
// etc. | ||
``` | ||
|
||
If you change it, be sure to change the signature of the function it returns as | ||
well. | ||
|
||
``` | ||
import ( | ||
"context" | ||
"fmt" | ||
"log" | ||
"net/http" | ||
"sync" | ||
"time" | ||
awsmiddleware "github.com/aws/aws-sdk-go-v2/aws/middleware" | ||
awshttp "github.com/aws/aws-sdk-go-v2/aws/transport/http" | ||
"github.com/aws/aws-sdk-go-v2/config" | ||
"github.com/aws/aws-sdk-go-v2/service/sqs" | ||
"github.com/aws/smithy-go/middleware" | ||
smithyrand "github.com/aws/smithy-go/rand" | ||
) | ||
// WithOperationTiming instruments an SQS client to dump timing information for | ||
// the following spans: | ||
// - overall operation time | ||
// - HTTPClient call time | ||
// | ||
// This instrumentation will also emit the request ID, service name, and | ||
// operation name for each invocation. | ||
// | ||
// Accepts a message "handler" which is invoked with formatted messages to be | ||
// handled externally, you can use the declared PrintfMSGHandler to simply dump | ||
// these values to stdout. | ||
func WithOperationTiming(msgHandler func(string)) func(*sqs.Options) { | ||
return func(o *sqs.Options) { | ||
o.APIOptions = append(o.APIOptions, addTimingMiddlewares(msgHandler)) | ||
o.HTTPClient = &timedHTTPClient{ | ||
client: awshttp.NewBuildableClient(), | ||
msgHandler: msgHandler, | ||
} | ||
} | ||
} | ||
// PrintfMSGHandler writes messages to stdout. | ||
func PrintfMSGHandler(msg string) { | ||
fmt.Printf("%s\n", msg) | ||
} | ||
type invokeIDKey struct{} | ||
func setInvokeID(ctx context.Context, id string) context.Context { | ||
return middleware.WithStackValue(ctx, invokeIDKey{}, id) | ||
} | ||
func getInvokeID(ctx context.Context) string { | ||
id, _ := middleware.GetStackValue(ctx, invokeIDKey{}).(string) | ||
return id | ||
} | ||
func timeSpan(ctx context.Context, name string, consumer func(string)) func() { | ||
start := time.Now() | ||
return func() { | ||
elapsed := time.Now().Sub(start) | ||
consumer(fmt.Sprintf("[%s] %s: %s", getInvokeID(ctx), name, elapsed)) | ||
} | ||
} | ||
type timedHTTPClient struct { | ||
client *awshttp.BuildableClient | ||
msgHandler func(string) | ||
} | ||
func (c *timedHTTPClient) Do(r *http.Request) (*http.Response, error) { | ||
defer timeSpan(r.Context(), "http", c.msgHandler)() | ||
resp, err := c.client.Do(r) | ||
if err != nil { | ||
return nil, fmt.Errorf("inner client do: %v", err) | ||
} | ||
return resp, nil | ||
} | ||
type addInvokeIDMiddleware struct { | ||
msgHandler func(string) | ||
} | ||
func (*addInvokeIDMiddleware) ID() string { return "addInvokeID" } | ||
func (*addInvokeIDMiddleware) HandleInitialize(ctx context.Context, in middleware.InitializeInput, next middleware.InitializeHandler) ( | ||
out middleware.InitializeOutput, md middleware.Metadata, err error, | ||
) { | ||
id, err := smithyrand.NewUUID(smithyrand.Reader).GetUUID() | ||
if err != nil { | ||
return out, md, fmt.Errorf("new uuid: %v", err) | ||
} | ||
return next.HandleInitialize(setInvokeID(ctx, id), in) | ||
} | ||
type timeOperationMiddleware struct { | ||
msgHandler func(string) | ||
} | ||
func (*timeOperationMiddleware) ID() string { return "timeOperation" } | ||
func (m *timeOperationMiddleware) HandleInitialize(ctx context.Context, in middleware.InitializeInput, next middleware.InitializeHandler) ( | ||
middleware.InitializeOutput, middleware.Metadata, error, | ||
) { | ||
defer timeSpan(ctx, "operation", m.msgHandler)() | ||
return next.HandleInitialize(ctx, in) | ||
} | ||
type emitMetadataMiddleware struct { | ||
msgHandler func(string) | ||
} | ||
func (*emitMetadataMiddleware) ID() string { return "emitMetadata" } | ||
func (m *emitMetadataMiddleware) HandleInitialize(ctx context.Context, in middleware.InitializeInput, next middleware.InitializeHandler) ( | ||
middleware.InitializeOutput, middleware.Metadata, error, | ||
) { | ||
out, md, err := next.HandleInitialize(ctx, in) | ||
invokeID := getInvokeID(ctx) | ||
requestID, _ := awsmiddleware.GetRequestIDMetadata(md) | ||
service := awsmiddleware.GetServiceID(ctx) | ||
operation := awsmiddleware.GetOperationName(ctx) | ||
m.msgHandler(fmt.Sprintf(`[%s] requestID = "%s"`, invokeID, requestID)) | ||
m.msgHandler(fmt.Sprintf(`[%s] service = "%s"`, invokeID, service)) | ||
m.msgHandler(fmt.Sprintf(`[%s] operation = "%s"`, invokeID, operation)) | ||
return out, md, err | ||
} | ||
func addTimingMiddlewares(mh func(string)) func(*middleware.Stack) error { | ||
return func(s *middleware.Stack) error { | ||
if err := s.Initialize.Add(&timeOperationMiddleware{msgHandler: mh}, middleware.Before); err != nil { | ||
return fmt.Errorf("add time operation middleware: %v", err) | ||
} | ||
if err := s.Initialize.Add(&addInvokeIDMiddleware{msgHandler: mh}, middleware.Before); err != nil { | ||
return fmt.Errorf("add invoke id middleware: %v", err) | ||
} | ||
if err := s.Initialize.Insert(&emitMetadataMiddleware{msgHandler: mh}, "RegisterServiceMetadata", middleware.After); err != nil { | ||
return fmt.Errorf("add emit metadata middleware: %v", err) | ||
} | ||
return nil | ||
} | ||
} | ||
func main() { | ||
cfg, err := config.LoadDefaultConfig(context.Background()) | ||
if err != nil { | ||
log.Fatal(fmt.Errorf("load default config: %v", err)) | ||
} | ||
svc := sqs.NewFromConfig(cfg, WithOperationTiming(PrintfMSGHandler)) | ||
var wg sync.WaitGroup | ||
for i := 0; i < 6; i++ { | ||
wg.Add(1) | ||
go func() { | ||
defer wg.Done() | ||
_, err = svc.ListQueues(context.Background(), nil) | ||
if err != nil { | ||
fmt.Println(fmt.Errorf("list queues: %v", err)) | ||
} | ||
}() | ||
} | ||
wg.Wait() | ||
} | ||
``` | ||
|
||
A sample output of this program: | ||
|
||
``` | ||
[e9a801bb-c51d-45c8-8e9f-a202e263fde8] http: 192.24067ms | ||
[e9a801bb-c51d-45c8-8e9f-a202e263fde8] requestID = "dbee3082-96a3-5b23-adca-6d005696fa94" | ||
[e9a801bb-c51d-45c8-8e9f-a202e263fde8] service = "SQS" | ||
[e9a801bb-c51d-45c8-8e9f-a202e263fde8] operation = "ListQueues" | ||
[e9a801bb-c51d-45c8-8e9f-a202e263fde8] operation: 193.098393ms | ||
[0740f0e0-953e-4328-94fc-830a5052e763] http: 195.185732ms | ||
[0740f0e0-953e-4328-94fc-830a5052e763] requestID = "48b301fa-fc9f-5f1f-9007-5c783caa9322" | ||
[0740f0e0-953e-4328-94fc-830a5052e763] service = "SQS" | ||
[0740f0e0-953e-4328-94fc-830a5052e763] operation = "ListQueues" | ||
[0740f0e0-953e-4328-94fc-830a5052e763] operation: 195.725491ms | ||
[c0589832-f351-4cc7-84f1-c656eb79dbd7] http: 200.52383ms | ||
[444030d0-6743-4de5-bd91-bc40b2b94c55] http: 200.525919ms | ||
[c0589832-f351-4cc7-84f1-c656eb79dbd7] requestID = "4a73cc82-b47b-56e1-b327-9100744e1b1f" | ||
[c0589832-f351-4cc7-84f1-c656eb79dbd7] service = "SQS" | ||
[c0589832-f351-4cc7-84f1-c656eb79dbd7] operation = "ListQueues" | ||
[c0589832-f351-4cc7-84f1-c656eb79dbd7] operation: 201.214365ms | ||
[444030d0-6743-4de5-bd91-bc40b2b94c55] requestID = "ca1523ed-1879-5610-bf5d-7e6fd84cabee" | ||
[444030d0-6743-4de5-bd91-bc40b2b94c55] service = "SQS" | ||
[444030d0-6743-4de5-bd91-bc40b2b94c55] operation = "ListQueues" | ||
[444030d0-6743-4de5-bd91-bc40b2b94c55] operation: 201.197071ms | ||
[079e8dbd-bb93-43ab-89e5-a7bb392b86a5] http: 206.449568ms | ||
[12b2b39d-df86-4648-a436-ff0482d13340] http: 206.526603ms | ||
[079e8dbd-bb93-43ab-89e5-a7bb392b86a5] requestID = "64229710-b552-56ed-8f96-ca927567ec7b" | ||
[079e8dbd-bb93-43ab-89e5-a7bb392b86a5] service = "SQS" | ||
[079e8dbd-bb93-43ab-89e5-a7bb392b86a5] operation = "ListQueues" | ||
[079e8dbd-bb93-43ab-89e5-a7bb392b86a5] operation: 207.252357ms | ||
[12b2b39d-df86-4648-a436-ff0482d13340] requestID = "76d9cbc0-07aa-58aa-98b7-9642c79f9851" | ||
[12b2b39d-df86-4648-a436-ff0482d13340] service = "SQS" | ||
[12b2b39d-df86-4648-a436-ff0482d13340] operation = "ListQueues" | ||
[12b2b39d-df86-4648-a436-ff0482d13340] operation: 207.360621ms | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters