-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* add an agent to promql-to-dd just because im here * promql-to-scrape add basic example for scraping metrics out of the Temporal Cloud observability endpoint and exposing a /metrics endpoint * add Dockerfile, examples, and a README * PR feedback * add container image to example deployment
- Loading branch information
1 parent
156add8
commit 1ff2f62
Showing
18 changed files
with
938 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
FROM golang:1.21-alpine | ||
|
||
WORKDIR /usr/src/app | ||
|
||
COPY go.mod go.sum ./ | ||
RUN go mod download && go mod verify | ||
|
||
COPY . . | ||
RUN go build -v -o /usr/local/bin/promql-to-scrape ./cmd/promql-to-scrape/main.go | ||
|
||
ENTRYPOINT ["/usr/local/bin/promql-to-scrape"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# promql-to-scrape | ||
|
||
This basic application is meant to provide an example for how one could use the Temporal Cloud Observability endpoint to expose a typical Prometheus `/metrics` endpoint. | ||
|
||
**This example is provided as-is, without support. It is intended as reference material only.** | ||
|
||
## How to Use | ||
|
||
Grab your client cert and key and place them at `client.crt`, `tls.key`, and your Temporal Cloud account number that has the observability endpoint enabled. | ||
|
||
``` | ||
go mod tidy | ||
go build -o promql-to-scrape cmd/promql-to-scrape/main.go | ||
./promql-to-scrape -client-cert client.crt -client- | ||
key tls.key -prom-endpoint https://<account>.tmprl.cloud/prometheus --config-file examples/config.yaml --debug | ||
~~~ | ||
time=2023-11-16T17:43:20.260-06:00 level=DEBUG msg="successful metric retrieval" time=3.529039083s | ||
``` | ||
|
||
This means you can now hit http://localhost:9001/metrics on your machine and see your metrics. | ||
|
||
### Important Usability Information | ||
|
||
**Important:** When you go to scrape this, you should do so with a **60s** scrape interval, unless you are meaningfully modifying this code. The example queries all assume a 1 minute rate and you'll want these to be equal. | ||
|
||
**Very Important:** The data you will see here is approximately 1 minute delayed (should you conform to the guidance above). Due to the aggregation that happens before metrics are presented to you, it's necessary for us to send the queries from this application to look 60 seconds in the past. Otherwise data aggregation would not be complete, and there would be no results for each query. | ||
|
||
## Deployment | ||
|
||
Some example Kubernetes manifests are provided in the `/examples` directory. Filling in your certificates and account should get you going pretty quickly. | ||
|
||
## Generating Config | ||
|
||
There is a second binary you can build that can help you build a default configuration of queries to scrape and export. | ||
|
||
``` | ||
go build -o genconfig cmd/genconfig/main.go | ||
./genconfig -client-cert client.crt -client-key tls.key -prom-endpoint https://<account>.tmprl.cloud/prometheus | ||
... | ||
``` | ||
|
||
This will generate an example config at `config.yaml` that you may use. It looks for all the existing metrics and generates a reasonable query for you to export. | ||
- For counters, a `rate(counter[1m])` | ||
- For gauges, it simply queries for `gauge` | ||
- For histograms, it does a p99 aggregated by `temporal_namespace` and `operation`. `histogram_quantile(0.99, sum(rate(metric[1m])) by (le, operation, temporal_namespace)` | ||
|
||
Modify at your own risk. You may find you'd like to add a global latency across all namespaces for instance. You can add those queries to your config file. |
84 changes: 84 additions & 0 deletions
84
cloud/observability/promql-to-scrape/cmd/genconfig/main.go
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
package main | ||
|
||
import ( | ||
"flag" | ||
"fmt" | ||
"log" | ||
"os" | ||
"sort" | ||
|
||
"github.com/temporalio/samples-server/cloud/observability/promql-to-scrape/internal" | ||
|
||
"gopkg.in/yaml.v3" | ||
) | ||
|
||
func main() { | ||
set := flag.NewFlagSet("app", flag.ExitOnError) | ||
promURL := set.String("prom-endpoint", "", "Required Prometheus API endpoint for the server eg. https://<account>.tmprl.cloud/prometheus") | ||
serverRootCACert := set.String("server-root-ca-cert", "", "Optional path to root server CA cert") | ||
clientCert := set.String("client-cert", "", "Required path to client cert") | ||
clientKey := set.String("client-key", "", "Required path to client key") | ||
serverName := set.String("server-name", "", "Optional server name to use for verifying the server's certificate") | ||
insecureSkipVerify := set.Bool("insecure-skip-verify", false, "Skip verification of the server's certificate and host name") | ||
|
||
if err := set.Parse(os.Args[1:]); err != nil { | ||
log.Fatalf("failed parsing args: %s", err) | ||
} else if *clientCert == "" || *clientKey == "" { | ||
log.Fatalf("-client-cert and -client-key are required") | ||
} | ||
|
||
client, err := internal.NewAPIClient( | ||
internal.APIConfig{ | ||
TargetHost: *promURL, | ||
ServerRootCACert: *serverRootCACert, | ||
ClientCert: *clientCert, | ||
ClientKey: *clientKey, | ||
ServerName: *serverName, | ||
InsecureSkipVerify: *insecureSkipVerify, | ||
}, | ||
) | ||
if err != nil { | ||
log.Fatalf("Failed to create Prometheus client: %s", err) | ||
} | ||
|
||
counters, gauges, histograms, err := client.ListMetrics("temporal_cloud_v0") | ||
if err != nil { | ||
log.Fatalf("Failed to pull metric names: %s", err) | ||
} | ||
fmt.Println(counters) | ||
fmt.Println(gauges) | ||
fmt.Println(histograms) | ||
|
||
conf := internal.Config{} | ||
|
||
for _, counter := range counters { | ||
conf.Metrics = append(conf.Metrics, internal.Metric{ | ||
MetricName: fmt.Sprintf("%s:rate1m", counter), | ||
Query: fmt.Sprintf("rate(%s[1m])", counter), | ||
}) | ||
} | ||
for _, gauge := range gauges { | ||
conf.Metrics = append(conf.Metrics, internal.Metric{ | ||
MetricName: gauge, | ||
Query: gauge, | ||
}) | ||
} | ||
for _, histogram := range histograms { | ||
conf.Metrics = append(conf.Metrics, internal.Metric{ | ||
MetricName: fmt.Sprintf("%s:histogram_quantile_p99_1m", histogram), | ||
Query: fmt.Sprintf("histogram_quantile(0.99, sum(rate(%s[1m])) by (le, operation, temporal_namespace))", histogram), | ||
}) | ||
} | ||
|
||
sort.Sort(internal.ByMetricName(conf.Metrics)) | ||
|
||
yamlData, err := yaml.Marshal(&conf) | ||
if err != nil { | ||
log.Fatalf("error marshalling yaml: %v", err) | ||
} | ||
|
||
err = os.WriteFile("config.yaml", yamlData, 0644) | ||
if err != nil { | ||
log.Fatalf("error: %v", err) | ||
} | ||
} |
59 changes: 59 additions & 0 deletions
59
cloud/observability/promql-to-scrape/cmd/promql-to-scrape/main.go
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
package main | ||
|
||
import ( | ||
"flag" | ||
"log" | ||
"os" | ||
|
||
"github.com/temporalio/samples-server/cloud/observability/promql-to-scrape/internal" | ||
|
||
"golang.org/x/exp/slog" | ||
) | ||
|
||
func main() { | ||
set := flag.NewFlagSet("promql-to-scrape", flag.ExitOnError) | ||
promURL := set.String("prom-endpoint", "", "Required Prometheus API endpoint for the server eg. https://<account>.tmprl.cloud/prometheus") | ||
configFile := set.String("config-file", "", "Config file for promql-to-scrape") | ||
serverRootCACert := set.String("server-root-ca-cert", "", "Optional path to root server CA cert") | ||
clientCert := set.String("client-cert", "", "Required path to client cert") | ||
clientKey := set.String("client-key", "", "Required path to client key") | ||
serverName := set.String("server-name", "", "Optional server name to use for verifying the server's certificate") | ||
insecureSkipVerify := set.Bool("insecure-skip-verify", false, "Skip verification of the server's certificate and host name") | ||
serverAddr := set.String("bind", "0.0.0.0:9001", "address:port to expose the metrics server on") | ||
debugLogging := set.Bool("debug", false, "Toggle debug logging") | ||
|
||
if err := set.Parse(os.Args[1:]); err != nil { | ||
log.Fatalf("failed parsing args: %v", err) | ||
} else if *clientCert == "" || *clientKey == "" || *configFile == "" || *promURL == "" { | ||
log.Fatalf("-client-cert, -client-key, -config-file, -prom-endpoint are required") | ||
} | ||
|
||
logLevel := slog.LevelInfo | ||
if *debugLogging { | ||
logLevel = slog.LevelDebug | ||
} | ||
h := slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: logLevel}) | ||
slog.SetDefault(slog.New(h)) | ||
|
||
client, err := internal.NewAPIClient( | ||
internal.APIConfig{ | ||
TargetHost: *promURL, | ||
ServerRootCACert: *serverRootCACert, | ||
ClientCert: *clientCert, | ||
ClientKey: *clientKey, | ||
ServerName: *serverName, | ||
InsecureSkipVerify: *insecureSkipVerify, | ||
}, | ||
) | ||
if err != nil { | ||
log.Fatalf("failed to create Prometheus client: %v", err) | ||
} | ||
|
||
conf, err := internal.LoadConfig(*configFile) | ||
if err != nil { | ||
log.Fatalf("failed to load config file: %v", err) | ||
} | ||
|
||
s := internal.NewPromToScrapeServer(client, conf, *serverAddr) | ||
s.Start() | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
metrics: | ||
- metric_name: temporal_cloud_v0_frontend_service_error_count:rate1m | ||
query: rate(temporal_cloud_v0_frontend_service_error_count[1m]) | ||
- metric_name: temporal_cloud_v0_frontend_service_pending_requests | ||
query: temporal_cloud_v0_frontend_service_pending_requests | ||
- metric_name: temporal_cloud_v0_frontend_service_request_count:rate1m | ||
query: rate(temporal_cloud_v0_frontend_service_request_count[1m]) | ||
- metric_name: temporal_cloud_v0_poll_success_count:rate1m | ||
query: rate(temporal_cloud_v0_poll_success_count[1m]) | ||
- metric_name: temporal_cloud_v0_poll_success_sync_count:rate1m | ||
query: rate(temporal_cloud_v0_poll_success_sync_count[1m]) | ||
- metric_name: temporal_cloud_v0_poll_timeout_count:rate1m | ||
query: rate(temporal_cloud_v0_poll_timeout_count[1m]) | ||
- metric_name: temporal_cloud_v0_resource_exhausted_error_count:rate1m | ||
query: rate(temporal_cloud_v0_resource_exhausted_error_count[1m]) | ||
- metric_name: temporal_cloud_v0_schedule_action_success_count:rate1m | ||
query: rate(temporal_cloud_v0_schedule_action_success_count[1m]) | ||
- metric_name: temporal_cloud_v0_schedule_buffer_overruns_count:rate1m | ||
query: rate(temporal_cloud_v0_schedule_buffer_overruns_count[1m]) | ||
- metric_name: temporal_cloud_v0_schedule_missed_catchup_window_count:rate1m | ||
query: rate(temporal_cloud_v0_schedule_missed_catchup_window_count[1m]) | ||
- metric_name: temporal_cloud_v0_service_latency_bucket:histogram_quantile_p99_1m | ||
query: histogram_quantile(0.99, sum(rate(temporal_cloud_v0_service_latency_bucket[1m])) by (le, operation, temporal_namespace)) | ||
- metric_name: temporal_cloud_v0_service_latency_count:rate1m | ||
query: rate(temporal_cloud_v0_service_latency_count[1m]) | ||
- metric_name: temporal_cloud_v0_service_latency_sum:rate1m | ||
query: rate(temporal_cloud_v0_service_latency_sum[1m]) | ||
- metric_name: temporal_cloud_v0_state_transition_count:rate1m | ||
query: rate(temporal_cloud_v0_state_transition_count[1m]) | ||
- metric_name: temporal_cloud_v0_total_action_count:rate1m | ||
query: rate(temporal_cloud_v0_total_action_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_cancel_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_cancel_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_continued_as_new_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_continued_as_new_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_failed_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_failed_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_success_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_success_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_terminate_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_terminate_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_timeout_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_timeout_count[1m]) |
49 changes: 49 additions & 0 deletions
49
cloud/observability/promql-to-scrape/examples/configmap.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
apiVersion: v1 | ||
kind: ConfigMap | ||
metadata: | ||
name: promql-to-scrape-config | ||
data: | ||
config.yaml: | | ||
metrics: | ||
- metric_name: temporal_cloud_v0_frontend_service_error_count:rate1m | ||
query: rate(temporal_cloud_v0_frontend_service_error_count[1m]) | ||
- metric_name: temporal_cloud_v0_frontend_service_pending_requests | ||
query: temporal_cloud_v0_frontend_service_pending_requests | ||
- metric_name: temporal_cloud_v0_frontend_service_request_count:rate1m | ||
query: rate(temporal_cloud_v0_frontend_service_request_count[1m]) | ||
- metric_name: temporal_cloud_v0_poll_success_count:rate1m | ||
query: rate(temporal_cloud_v0_poll_success_count[1m]) | ||
- metric_name: temporal_cloud_v0_poll_success_sync_count:rate1m | ||
query: rate(temporal_cloud_v0_poll_success_sync_count[1m]) | ||
- metric_name: temporal_cloud_v0_poll_timeout_count:rate1m | ||
query: rate(temporal_cloud_v0_poll_timeout_count[1m]) | ||
- metric_name: temporal_cloud_v0_resource_exhausted_error_count:rate1m | ||
query: rate(temporal_cloud_v0_resource_exhausted_error_count[1m]) | ||
- metric_name: temporal_cloud_v0_schedule_action_success_count:rate1m | ||
query: rate(temporal_cloud_v0_schedule_action_success_count[1m]) | ||
- metric_name: temporal_cloud_v0_schedule_buffer_overruns_count:rate1m | ||
query: rate(temporal_cloud_v0_schedule_buffer_overruns_count[1m]) | ||
- metric_name: temporal_cloud_v0_schedule_missed_catchup_window_count:rate1m | ||
query: rate(temporal_cloud_v0_schedule_missed_catchup_window_count[1m]) | ||
- metric_name: temporal_cloud_v0_service_latency_bucket:histogram_quantile_p99_1m | ||
query: histogram_quantile(0.99, sum(rate(temporal_cloud_v0_service_latency_bucket[1m])) by (le, operation, temporal_namespace)) | ||
- metric_name: temporal_cloud_v0_service_latency_count:rate1m | ||
query: rate(temporal_cloud_v0_service_latency_count[1m]) | ||
- metric_name: temporal_cloud_v0_service_latency_sum:rate1m | ||
query: rate(temporal_cloud_v0_service_latency_sum[1m]) | ||
- metric_name: temporal_cloud_v0_state_transition_count:rate1m | ||
query: rate(temporal_cloud_v0_state_transition_count[1m]) | ||
- metric_name: temporal_cloud_v0_total_action_count:rate1m | ||
query: rate(temporal_cloud_v0_total_action_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_cancel_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_cancel_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_continued_as_new_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_continued_as_new_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_failed_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_failed_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_success_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_success_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_terminate_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_terminate_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_timeout_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_timeout_count[1m]) |
47 changes: 47 additions & 0 deletions
47
cloud/observability/promql-to-scrape/examples/deployment.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: promql-to-scrape | ||
labels: | ||
app: promql-to-scrape | ||
spec: | ||
replicas: 1 | ||
selector: | ||
matchLabels: | ||
app: promql-to-scrape | ||
template: | ||
metadata: | ||
labels: | ||
app: promql-to-scrape | ||
spec: | ||
containers: | ||
- name: promql-to-scrape | ||
image: ghcr.io/temporalio/promql-to-scrape:7c0e91a | ||
args: | ||
- --client-cert=/var/run/secrets/ca_crt | ||
- --client-key=/var/run/secrets/ca_key | ||
- --prom-endpoint=https://<account>.tmprl.cloud/prometheus | ||
- --config-file=/etc/promql-to-scrape/config.yaml | ||
- --debug | ||
ports: | ||
- containerPort: 9001 | ||
volumeMounts: | ||
- name: secrets | ||
mountPath: /var/run/secrets | ||
readOnly: true | ||
- name: config-volume | ||
mountPath: /etc/promql-to-scrape | ||
resources: | ||
limits: | ||
cpu: "100m" | ||
memory: "256Mi" | ||
volumes: | ||
- name: secrets | ||
secret: | ||
secretName: promql-to-scrape-secrets | ||
- name: config-volume | ||
configMap: | ||
name: promql-to-scrape-config | ||
items: | ||
- key: config.yaml | ||
path: config.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
apiVersion: v1 | ||
kind: Secret | ||
type: Opaque | ||
metadata: | ||
name: promql-to-scrape-secrets | ||
labels: | ||
app: promql-to-scrape | ||
data: | ||
ca_crt: "<cert | base64>" | ||
ca_key: "<key | base64>" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
module github.com/temporalio/samples-server/cloud/observability/promql-to-scrape | ||
|
||
go 1.21 | ||
|
||
require ( | ||
github.com/prometheus/client_golang v1.17.0 | ||
github.com/prometheus/common v0.45.0 | ||
golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa | ||
gopkg.in/yaml.v3 v3.0.1 | ||
) | ||
|
||
require ( | ||
github.com/json-iterator/go v1.1.12 // indirect | ||
github.com/kr/text v0.2.0 // indirect | ||
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect | ||
github.com/modern-go/reflect2 v1.0.2 // indirect | ||
) |
Oops, something went wrong.