Skip to content

Commit

Permalink
docs: document reverse proxy config for job streams
Browse files Browse the repository at this point in the history
  • Loading branch information
npepinpe committed Jul 4, 2024
1 parent 3bd768e commit 51a2f7b
Show file tree
Hide file tree
Showing 4 changed files with 38 additions and 0 deletions.
6 changes: 6 additions & 0 deletions docs/apis-tools/go-client/job-worker.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,12 @@ To avoid your workers being overloaded with too many jobs, e.g. running out of m

**If streaming is enabled, back pressure applies to both pushing and polling**. You can then use `MaxJobsActive` and `Concurrency` as a way to soft-bound the memory usage of your worker. For example, given a maximum variable payload for a job of 1MB, `MaxJobsActive = 32`, and `Concurrency = 10`, then a single worker could use up to 42MB of memory. You can estimate a worst case scenario using the configured maximum message size, as no job payload will ever exceed this.

#### Proxying

If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the worker is not killed unexpectedly. If you observe regular 504 timeouts, consider reading [this guide](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming).

Note that by default, the Go job workers have a stream timeout of 1 hour.

## Additional resources

- [Job worker reference](/components/concepts/job-workers.md)
6 changes: 6 additions & 0 deletions docs/apis-tools/java-client/job-worker.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,12 @@ To avoid your workers being overloaded with too many jobs, e.g. running out of m
If the worker blocks longer than the job's deadline, the job will **not** be passed to the worker, but will be dropped. As it will time out on the broker side, it will be pushed again.
:::

#### Proxying

If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the worker is not killed unexpectedly. If you observe regular 504 timeouts, consider reading [this guide](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming).

Note that by default, the Java job workers have a stream timeout of 1 hour.

## Multi-tenancy

You can configure a job worker to pick up jobs belonging to one or more tenants. When using the builder, you can configure
Expand Down
4 changes: 4 additions & 0 deletions docs/components/concepts/job-workers.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,10 @@ If you're using Prometheus, you can use the following query to estimate the queu

On the server side (e.g. if you're running a self-managed cluster), you can measure the rate of jobs which are not pushed due to clients which are not ready via the metric `zeebe_broker_jobs_push_fail_try_count_total{code="BLOCKED"}`. If the rate of this metric is high for a sustained amount of time, it may be a good indicator that you need to scale your workers. Unfortunately, on the server side we don't differentiate between clients, so this metric doesn't tell you which worker deployment needs to be scaled. We thus recommend using client metrics whenever possible.

#### Proxying

If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the worker is not killed unexpectedly. If you observe regular 504 timeouts, consider reading [this guide](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming).

### Troubleshooting

Since this feature requires a good amount of coordination between various components over the network, we've built in some tools to help monitor the health of the job streams.
Expand Down
22 changes: 22 additions & 0 deletions docs/self-managed/zeebe-deployment/zeebe-gateway/job-streaming.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
id: job-streaming
title: "Job Streaming"
sidebar_label: "Job Streaming"
---

Streaming job workers are expected to be long lived in order to cut down on the latency overhead involved with (re)creating a stream and propagating this throughout the cluster. This may require special configuration, especially if you're using a reverse proxy in front of your gateway. Typically, this will affect you in the form of HTTP 504 (Gateway Timeout) being returned to your job streaming worker at regular intervals.

:::note
Note that this configuration is _only_ required for reverse proxies which do not support forwarding HTTP/2 keepalive (on either side). See, for example, [this nginx ticket](https://trac.nginx.org/nginx/ticket/1887).

If your proxy supports it, then you don't need to do anything.
:::

The general recommendation would be to apply the following configuration:

- On your client, set an explicit stream timeout, say, 1h.
- On your reverse proxy, ensure the read response timeout is set to slightly higher than your client, e.g. 1h10.

## NGINX

As aforementioned, nginx is a known proxy which does not support forward HTTP/2 pings from either side as a form of keepalive. You should configure an appropriate `grpc_send_timeout` such that it is _higher_ than your job worker stream timeout configuration.

0 comments on commit 51a2f7b

Please sign in to comment.