Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add telemetry support #158

Merged
merged 13 commits into from
Apr 16, 2024
Merged

Add telemetry support #158

merged 13 commits into from
Apr 16, 2024

Conversation

ryanartecona
Copy link
Contributor

@ryanartecona ryanartecona commented Mar 7, 2024

This introduces a telemetry deployment and supporting resources which, when enabled, will collect metrics and logs from the other pods in the chart. When the telemetry deployment is enabled, those metrics and logs will be sent to Retool, for use cases such as proactively finding retool bugs which manifest in self-hosted deployment logs and having an easier path for Retool staff to diagnose deployment issues in a self-hosted deployment.

@ryanartecona ryanartecona force-pushed the ra/telemetry branch 4 times, most recently from 667c11a to dd06dfe Compare March 8, 2024 00:48
*/}}
{{- define "retool.workflowBackend.selectorLabels" -}}
retoolService: {{ template "retool.workflowBackend.name" . }}
{{- end }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and all the retool.*.selectorLabels partials below don't actually change the selector labels that were already used, they're just refactoring the template code.

*/}}
{{- define "retool.workflowWorker.name" -}}
{{ template "retool.fullname" . }}-workflow-worker
{{- end -}}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These result in the exact same names as before, just a template code refactor for reuse.

@@ -54,7 +55,7 @@ spec:
{{- end }}
{{- end }}
containers:
- name: {{ .Chart.Name }}
- name: main-backend
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

container names are mutable, since the pods just get recreated as normal. using semantic container names helps log collection and metric collection, since deployment name isn't always available and pod name is dependent on nameOverride and such.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for the change, let's finalize the name of this service. esp. as we update docs here: https://github.com/tryretool/docs/pull/647

Candidates

  • main-backend
  • api-backend
  • api
  • backend

main-backed is nice because it corresponds to MAIN_BACKEND in SERVICE_TYPE at least.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's go with "main-backend" even tho this has services bundled into it (including db connector)

"api" is not as discoverable in docs.

@@ -244,6 +248,7 @@ spec:
port: {{ .Values.service.internalPort }}
initialDelaySeconds: {{ .Values.livenessProbe.initialDelaySeconds }}
timeoutSeconds: {{ .Values.livenessProbe.timeoutSeconds }}
periodSeconds: {{ .Values.livenessProbe.periodSeconds }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was present in values.yaml but not in the template, so I assume just an oversight.

annotations:
{{- if or .Values.telemetry.customVectorConfig .Values.telemetry.customGrafanaAgentConfig }}
checksum/custom-config: {{ include (print $.Template.BasePath "/telemetry_configmap.yaml") . | sha256sum | quote }}
{{- end }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes it so that custom config changes will still roll the telemetry pod even if the pod itself doesn't change

Copy link
Contributor

@avimoondra avimoondra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@@ -54,7 +55,7 @@ spec:
{{- end }}
{{- end }}
containers:
- name: {{ .Chart.Name }}
- name: main-backend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for the change, let's finalize the name of this service. esp. as we update docs here: https://github.com/tryretool/docs/pull/647

Candidates

  • main-backend
  • api-backend
  • api
  • backend

main-backed is nice because it corresponds to MAIN_BACKEND in SERVICE_TYPE at least.

deployment recreation and incur downtime, so should be avoided.
*/}}
{{- define "retool.codeExecutor.selectorLabels" -}}
retoolService: {{ template "retool.codeExecutor.name" . }}
Copy link
Contributor

@avimoondra avimoondra Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
retoolService: {{ template "retool.codeExecutor.name" . }}
retoolService: {{ include "retool.codeExecutor.name" . }}

same effect, just for consistency.

deployment recreation and incur downtime, so should be avoided.
*/}}
{{- define "retool.workflowWorker.selectorLabels" -}}
retoolService: {{ template "retool.workflowWorker.name" . }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
retoolService: {{ template "retool.workflowWorker.name" . }}
retoolService: {{ include "retool.workflowWorker.name" . }}

deployment recreation and incur downtime, so should be avoided.
*/}}
{{- define "retool.workflowBackend.selectorLabels" -}}
retoolService: {{ template "retool.workflowBackend.name" . }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
retoolService: {{ template "retool.workflowBackend.name" . }}
retoolService: {{ include "retool.workflowBackend.name" . }}

same effect, just for consistency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deployment_telemetry.yaml perhaps for consistency

metadata:
name: {{ include "retool.fullname" . }}-telemetry
labels:
{{- include "retool.telemetry.labels" . | nindent 4 }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{{- include "retool.telemetry.labels" . | nindent 4 }}
{{- include "retool.telemetry.labels" . | nindent 4 }}
{{- include "retool.telemetry.selectorLabels" . | nindent 4 }}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if needed, but for consistency as well

metadata:
name: {{ include "retool.telemetry.fullname" . }}
labels:
{{- include "retool.telemetry.labels" . | nindent 4 }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{{- include "retool.telemetry.labels" . | nindent 4 }}
{{- include "retool.telemetry.labels" . | nindent 4 }}
{{- include "retool.telemetry.selectorLabels" . | nindent 4 }}

Comment on lines 58 to 61
- name: RTEL_DEPLOYMENT_MODE
value: 'kubernetes-helm'
- name: RTEL_HELM_RELEASE_NAME
value: {{ .Release.Name | quote }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will need to keep the env vars RTEL_DEPLOYMENT_MODE and DEPLOYMENT_TEMPLATE_TYPE separate after all, because I think they will diverge for other deployment templates. I'll still at least unify the values so e.g. k8s-helm is used by both, and also add DEPLOYMENT_TEMPLATE_TYPE as tags to metrics/logs when present.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack ok 👍

@@ -44,21 +44,74 @@ Common labels
*/}}
{{- define "retool.labels" -}}
helm.sh/chart: {{ include "retool.chart" . }}
{{ include "retool.selectorLabels" . }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is AppVersion vs. .Chart.Version ? (only diff from retool.telemetry.labels)

I see it's used by retool-temporal-services-helm only 👀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure .Chart.Version is the chart version, and .Chart.AppVersion would be the retool version. except .Chart.AppVersion would come from appVersion: in Chart.yaml, which is optional and absent for our chart. so I think .Chart.AppVersion is effectively always null, and we don't use it to specify retool version anyway. docs: https://helm.sh/docs/topics/charts/

I didn't realize all this when I first wrote this here though, so I'm just going to take it out so it doesn't confuse us further.

@ryanartecona ryanartecona force-pushed the ra/telemetry branch 2 times, most recently from 593bb59 to b29363a Compare March 19, 2024 23:29
@ryanartecona ryanartecona merged commit bc3f63a into main Apr 16, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants