-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add telemetry support #158
Conversation
667c11a
to
dd06dfe
Compare
*/}} | ||
{{- define "retool.workflowBackend.selectorLabels" -}} | ||
retoolService: {{ template "retool.workflowBackend.name" . }} | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and all the retool.*.selectorLabels
partials below don't actually change the selector labels that were already used, they're just refactoring the template code.
*/}} | ||
{{- define "retool.workflowWorker.name" -}} | ||
{{ template "retool.fullname" . }}-workflow-worker | ||
{{- end -}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These result in the exact same names as before, just a template code refactor for reuse.
@@ -54,7 +55,7 @@ spec: | |||
{{- end }} | |||
{{- end }} | |||
containers: | |||
- name: {{ .Chart.Name }} | |||
- name: main-backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
container names are mutable, since the pods just get recreated as normal. using semantic container names helps log collection and metric collection, since deployment name isn't always available and pod name is dependent on nameOverride and such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for the change, let's finalize the name of this service. esp. as we update docs here: https://github.com/tryretool/docs/pull/647
Candidates
- main-backend
- api-backend
- api
- backend
main-backed is nice because it corresponds to MAIN_BACKEND in SERVICE_TYPE at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's go with "main-backend" even tho this has services bundled into it (including db connector)
"api" is not as discoverable in docs.
@@ -244,6 +248,7 @@ spec: | |||
port: {{ .Values.service.internalPort }} | |||
initialDelaySeconds: {{ .Values.livenessProbe.initialDelaySeconds }} | |||
timeoutSeconds: {{ .Values.livenessProbe.timeoutSeconds }} | |||
periodSeconds: {{ .Values.livenessProbe.periodSeconds }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was present in values.yaml but not in the template, so I assume just an oversight.
annotations: | ||
{{- if or .Values.telemetry.customVectorConfig .Values.telemetry.customGrafanaAgentConfig }} | ||
checksum/custom-config: {{ include (print $.Template.BasePath "/telemetry_configmap.yaml") . | sha256sum | quote }} | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes it so that custom config changes will still roll the telemetry pod even if the pod itself doesn't change
dd06dfe
to
0eab868
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
@@ -54,7 +55,7 @@ spec: | |||
{{- end }} | |||
{{- end }} | |||
containers: | |||
- name: {{ .Chart.Name }} | |||
- name: main-backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for the change, let's finalize the name of this service. esp. as we update docs here: https://github.com/tryretool/docs/pull/647
Candidates
- main-backend
- api-backend
- api
- backend
main-backed is nice because it corresponds to MAIN_BACKEND in SERVICE_TYPE at least.
charts/retool/templates/_helpers.tpl
Outdated
deployment recreation and incur downtime, so should be avoided. | ||
*/}} | ||
{{- define "retool.codeExecutor.selectorLabels" -}} | ||
retoolService: {{ template "retool.codeExecutor.name" . }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retoolService: {{ template "retool.codeExecutor.name" . }} | |
retoolService: {{ include "retool.codeExecutor.name" . }} |
same effect, just for consistency.
charts/retool/templates/_helpers.tpl
Outdated
deployment recreation and incur downtime, so should be avoided. | ||
*/}} | ||
{{- define "retool.workflowWorker.selectorLabels" -}} | ||
retoolService: {{ template "retool.workflowWorker.name" . }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retoolService: {{ template "retool.workflowWorker.name" . }} | |
retoolService: {{ include "retool.workflowWorker.name" . }} |
charts/retool/templates/_helpers.tpl
Outdated
deployment recreation and incur downtime, so should be avoided. | ||
*/}} | ||
{{- define "retool.workflowBackend.selectorLabels" -}} | ||
retoolService: {{ template "retool.workflowBackend.name" . }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retoolService: {{ template "retool.workflowBackend.name" . }} | |
retoolService: {{ include "retool.workflowBackend.name" . }} |
same effect, just for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deployment_telemetry.yaml
perhaps for consistency
metadata: | ||
name: {{ include "retool.fullname" . }}-telemetry | ||
labels: | ||
{{- include "retool.telemetry.labels" . | nindent 4 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{{- include "retool.telemetry.labels" . | nindent 4 }} | |
{{- include "retool.telemetry.labels" . | nindent 4 }} | |
{{- include "retool.telemetry.selectorLabels" . | nindent 4 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if needed, but for consistency as well
metadata: | ||
name: {{ include "retool.telemetry.fullname" . }} | ||
labels: | ||
{{- include "retool.telemetry.labels" . | nindent 4 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{{- include "retool.telemetry.labels" . | nindent 4 }} | |
{{- include "retool.telemetry.labels" . | nindent 4 }} | |
{{- include "retool.telemetry.selectorLabels" . | nindent 4 }} |
- name: RTEL_DEPLOYMENT_MODE | ||
value: 'kubernetes-helm' | ||
- name: RTEL_HELM_RELEASE_NAME | ||
value: {{ .Release.Name | quote }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe: align this to https://github.com/tryretool/retool_development/blob/dev/backend/src/server/modules/jobsRunner/startupTelemetry.ts#L49-L62
DEPLOYMENT_TEMPLATE_TYPE
DEPLOYMENT_TEMPLATE_VERSION
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will need to keep the env vars RTEL_DEPLOYMENT_MODE
and DEPLOYMENT_TEMPLATE_TYPE
separate after all, because I think they will diverge for other deployment templates. I'll still at least unify the values so e.g. k8s-helm
is used by both, and also add DEPLOYMENT_TEMPLATE_TYPE
as tags to metrics/logs when present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack ok 👍
charts/retool/templates/_helpers.tpl
Outdated
@@ -44,21 +44,74 @@ Common labels | |||
*/}} | |||
{{- define "retool.labels" -}} | |||
helm.sh/chart: {{ include "retool.chart" . }} | |||
{{ include "retool.selectorLabels" . }} | |||
{{- if .Chart.AppVersion }} | |||
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is AppVersion vs. .Chart.Version ? (only diff from retool.telemetry.labels)
I see it's used by retool-temporal-services-helm only 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty sure .Chart.Version
is the chart version, and .Chart.AppVersion
would be the retool version. except .Chart.AppVersion
would come from appVersion:
in Chart.yaml, which is optional and absent for our chart. so I think .Chart.AppVersion
is effectively always null, and we don't use it to specify retool version anyway. docs: https://helm.sh/docs/topics/charts/
I didn't realize all this when I first wrote this here though, so I'm just going to take it out so it doesn't confuse us further.
593bb59
to
b29363a
Compare
8ea6630
to
ba0982f
Compare
This introduces a
telemetry
deployment and supporting resources which, when enabled, will collect metrics and logs from the other pods in the chart. When the telemetry deployment is enabled, those metrics and logs will be sent to Retool, for use cases such as proactively finding retool bugs which manifest in self-hosted deployment logs and having an easier path for Retool staff to diagnose deployment issues in a self-hosted deployment.