-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement batch submission for traces #469
Implement batch submission for traces #469
Conversation
… when lagging behind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped some comments
import java.util.logging.Logger; | ||
import java.util.zip.GZIPOutputStream; | ||
|
||
public class CompressedBatchSender<T> implements JsonPayloadSender<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to we have unit tests for this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, added unit tests
@@ -176,7 +174,7 @@ private static boolean postLogs(HttpClient httpClient, String logIntakeUrl, Secr | |||
|
|||
byte[] body = payload.getBytes(StandardCharsets.UTF_8); | |||
try { | |||
httpClient.postAsynchronously(logIntakeUrl, headers, "application/json", body); | |||
httpClient.post(logIntakeUrl, headers, "application/json", body, Function.identity()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is postLogs
private static method only used in validateLogIntakeConnection
method?
If so, should we add all the logic to the validateLogIntakeConnection
method? It was unexpected to me read postLogs
and see that's only invoked in the validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I think it was used by some other logic in the past, now it's only used for validation. Inlined the method.
src/main/java/org/datadog/jenkins/plugins/datadog/clients/CompressedBatchSender.java
Show resolved
Hide resolved
|
||
@Override | ||
public int order() { | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to have in a static variable the order of all FlareContributor
. Currently, I don't know if the number is used in other places without checking all the impls for FlareContributor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not that much of a problem if the order is the same, this only affects sorting in UI. Still a valid point, added some named constants for the orders.
private static final int DEFAULT_QUEUE_CAPACITY = 5_000; | ||
private static final int DEFAULT_SUBMIT_TIMEOUT_SECONDS = 5; | ||
private static final int DEFAULT_QUEUE_CAPACITY = 10_000; | ||
private static final int DEFAULT_SUBMIT_TIMEOUT_SECONDS = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the impact of no timeout, infinite timeout??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The opposite actually, with 0 timeout any trace/log that does not fit into the queue will be dropped immediately.
The idea is to avoid blocking the client's code and the Jenkins core. If the queue fills up, it means traces/logs are being produced faster than they're being dispatched, and a timeout will not resolve this problem, just postpone it.
private final Timer submit; | ||
private final Meter submitDropped; | ||
private final Timer dispatch; | ||
private final Gauge<Integer> queueSize; | ||
private final Histogram batchSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for internal usage right? We're not reporting these metrics to Datadog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, this only shows up in diagnostic flares for now. It'd be nice to integrate this with actual telemetry, but this is too much of an effort for now.
Requirements for Contributing to this repository
What does this PR do?
Adds batch submission of traces for Agentless and Agentfull modes
Traces are submitted in batches instead of each event being submitted in a separate HTTP request.
For now batching is disabled by default, and can be enabled with the
DD_JENKINS_ENABLE_TRACES_BATCHING
environment variable set totrue
.Extends diagnostic flare with health stats for async submitters
Various metrics are added, such as:
Increases the default batch size for logs and traces
Batch size is increased from 100 to 500, as stress testing showed that larger batch sizes increase throughput.
Updates the default behaviour of the async submission of logs and traces to not block
When a log or a trace element is submitted, it is added to a queue that is processed asynchronously by a separate thread.
The queue has a limited size, so it is possible that it's full when a submission takes place.
The old logic blocked the submitting thread in such cases (5s for logs, 30s for traces).
Blocking the thread has a disadvantage: it slows down the execution of pipelines.
The new logic will drop submitted log/trace when the queue is full.
The queue is there to account for short bursts of logs/traces.
If it gets full than it is likely that the consumer is consistently slower than the producer, which will lead to severe performance degradation in case of blocking on submission.
Description of the Change
Alternate Designs
Possible Drawbacks
Verification Process
Additional Notes
Release Notes
Review checklist (to be filled by reviewers)
changelog/
label attached. If applicable it should have thebackward-incompatible
label attached.do-not-merge/
label attached.kind/
andseverity/
labels attached at least.