Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runner can't connect to external prometheus and alertmanager #1097

Open
smoug25 opened this issue Sep 23, 2023 · 17 comments
Open

Runner can't connect to external prometheus and alertmanager #1097

smoug25 opened this issue Sep 23, 2023 · 17 comments
Labels
bug Something isn't working

Comments

@smoug25
Copy link

smoug25 commented Sep 23, 2023

Describe the bug
I have multicluster setup with separate monitoring cluster. For metrics querying i use Thanos Query and it works fine in-cluster robusta runner can connect through thanos query service and alertmanager throurn alertmanager service. I expose hosts for Thanos Query and Alertmanager with JWT auth through Ambassador Edge Stack. I able to request thanos and alert manager from my machine successfully but robusta runner return errors for thanos query 401 code for Alertmanager 400 code.

To Reproduce
Steps to reproduce the behavior:

  1. Setup two clusters cluster A and cluster B
  2. Expose prometheus and alert manager on cluster A with JWT authorization
  3. Install robusta to clusters and add to robusta on cluster B url to prometheus and aletmanager in cluster A
  4. See error in rubusta on cluster B

Expected behavior
No errors in robusta logs on external cluster and available app metrics in robusta UI

Robusta runner logs

�[31m2023-09-23 06:02:39.386 ERROR Failed to connect to prometheus. Couldn't connect to Prometheus found under https://thanos-query.areon.io
Caused by HTTPError: 401 Client Error: Unauthorized for url: https://my-prometheus.url/api/v1/query?cluster=dev)
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/prometrix/connect/custom_connect.py", line 101, in check_prometheus_connection
response.raise_for_status()
File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://my-prometheus.url/api/v1/query?cluster=dev

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/app/src/robusta/core/sinks/robusta/prometheus_health_checker.py", line 61, in prometheus_connection_checks
prometheus_connection.check_prometheus_connection(params={})
File "/usr/local/lib/python3.9/site-packages/prometrix/connect/custom_connect.py", line 103, in check_prometheus_connection
raise PrometheusNotFound(
prometrix.exceptions.PrometheusNotFound: Couldn't connect to Prometheus found under https://my-prometheus.url
Caused by HTTPError: 401 Client Error: Unauthorized for url: https://my-prometheus.url/api/v1/query?cluster=dev)

Caused by HTTPError: 400 Client Error: Bad Request for url: https://my-alertmanager.url/api/v2/silences)
Traceback (most recent call last):
File "/app/src/robusta/utils/silence_utils.py", line 113, in get_alertmanager_silences_connection
response.raise_for_status()
File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://my-alertmanager.url/api/v2/silences

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/app/src/robusta/core/sinks/robusta/prometheus_health_checker.py", line 97, in alertmanager_connection_checks
get_alertmanager_silences_connection(params=base_silence_params)
File "/app/src/robusta/utils/silence_utils.py", line 116, in get_alertmanager_silences_connection
raise AlertsManagerNotFound(
robusta.core.exceptions.AlertsManagerNotFound: Could not connect to the alert manager [https://alertmanager.areon.io]
Caused by HTTPError: 400 Client Error: Bad Request for url: https://my-alertmanager.url/api/v2/silences)�[0m

@github-actions
Copy link

Hi 👋, thanks for opening an issue! Please note, it may take some time for us to respond, but we'll get back to you as soon as we can!

  • 💬 Slack Community: Join Robusta team and other contributors on Slack here.
  • 📖 Docs: Find our documentation here.
  • 🎥 YouTube Channel: Watch our videos here).

@Avi-Robusta
Copy link
Contributor

Hi @smoug25,
I don't think we currently support JWT authorization in prometheus but we do support adding a custom prometheus authorization headers in robusta.

https://docs.robusta.dev/master/configuration/alertmanager-integration/outofcluster-prometheus.html#authentication-headers

Does something like this help?

@smoug25
Copy link
Author

smoug25 commented Sep 24, 2023

Hi @Avi-Robusta
Thanks for reply.
No, it doesn't help unfortunately, I have use JWT as bearer token already.
This is my robusta helm values file

robusta:
  clusterName: dev
  enablePrometheusStack: false
  disableCloudRouting: false
  globalConfig:
    alertmanager_url: "https://(my-alertmanager.url)"
    grafana_url: ""
    prometheus_url: "https://(my-prometheus.url)"
    chat_gpt_token: "{{ env.CHAT_GPT_TOKEN }}"

    prometheus_additional_labels:
      cluster: dev
    
    signing_key: "{{ env.ROBUSTA_GLOBAL_SIGNING_KEY }}"
    account_id: "{{ env.ROBUSTA_GLOBAL_ACCOUNT_ID }}"

    prometheus_auth: "Bearer {{ env.JWT_TOKEN }}"
    alertmanager_auth: "Bearer {{ env.JWT_TOKEN }}"
    prometheus_url_query_string: "cluster=dev"
  sinksConfig:
    - discord_sink:
        name: areon_discord_sink
        url: "{{ env.DISCORD_WEBHOOK }}"
    - robusta_sink:
        name: robusta_ui_sink
        token: "{{ env.ROBUSTA_TOKEN }}"
  enablePlatformPlaybooks: true
  runner:
    additional_env_vars:
    - name: GRAFANA_KEY
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: grafana_key
    - name: DISCORD_WEBHOOK
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: discord_webhook
    - name: ROBUSTA_TOKEN
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: robusta_token
    - name: ROBUSTA_GLOBAL_SIGNING_KEY
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: robusta_global_signing_key
    - name: ROBUSTA_GLOBAL_ACCOUNT_ID
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: robusta_global_account_id
    - name: CHAT_GPT_TOKEN
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: chat_gpt_token
    - name: JWT_TOKEN
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: jwt_token
    - name: PROMETHEUS_SSL_ENABLED
      value: "true"                                                           
    sendAdditionalTelemetry: false
  rsa:
    private:  -- secret --
    public: -- secret --
  playbookRepos:
    chatgpt_robusta_actions:
      url: "https://github.com/robusta-dev/kubernetes-chatgpt-bot.git"

  customPlaybooks:
  - triggers:
    - on_prometheus_alert: {}
    actions:
    - chat_gpt_enricher: {}

@Avi-Robusta
Copy link
Contributor

Avi-Robusta commented Sep 24, 2023

Hi @smoug25
Can you try running this with your url and token to see what thanos responds?

curl --location 'https://MY-PROMETHEUS.URL/api/v1/query?query=up' \
--header 'Authorization: Bearer JWT_TOKEN'

Some users have had issues with thanos because they needed to either specify port or make the url http instead of https, so if the curl doesn't work try either or both of those

@smoug25
Copy link
Author

smoug25 commented Sep 24, 2023

@Avi-Robusta after sending your request to my thanos I got valid response with metrics. Let me narrow down one point. I use thanos without auth but it behind proxy with auth by JWT that expected in header 'Authorization: Bearer JWT_TOKEN'. And my alertmanager sites behind the same proxy but I get 400 code in response.

@smoug25
Copy link
Author

smoug25 commented Oct 2, 2023

Hello, @Avi-Robusta. Do you have any updates with this issue?

@Avi-Robusta
Copy link
Contributor

Hi @smoug25 ,
I wasnt able to replicate the issue, Would you like to jump on a call for me to debug this with you?
You can pick a time from my Calendly.

@smoug25
Copy link
Author

smoug25 commented Oct 17, 2023

Hi @Avi-Robusta,
Do you have any ideas that we could do for better issue understanding?

@Sheeproid
Copy link
Contributor

Hi @smoug25 . Avi is currently not available. It will be easier to discuss in the Slack community in the #support channel.

@Sheeproid Sheeproid added the bug Something isn't working label Oct 31, 2023
@aantn
Copy link
Collaborator

aantn commented Feb 22, 2024

@smoug25 can you confirm if this is still happening or if it was fixed?

@smoug25
Copy link
Author

smoug25 commented Feb 22, 2024

@aantn I'v updated to 0.10.29 and problem still relevant.

@aantn
Copy link
Collaborator

aantn commented Feb 22, 2024

Weird. If you run the curl command from the robusta-runner pod, does it work? I am trying to figure out what is different about the way the runner connects.

@smoug25
Copy link
Author

smoug25 commented Feb 23, 2024

If I make a Curl request from the robusta-runners Pod, it works fine and I receive a status code of 200 (OK).

@smoug25
Copy link
Author

smoug25 commented Sep 26, 2024

I've found cause of problem. Something wrong with templating.
I used, and this did not work:

prometheus_auth: "Bearer {{ env.JWT_TOKEN }}" alertmanager_auth: "Bearer {{ env.JWT_TOKEN }}"

After I added "Bearer " to kubernetes secret and live only env in template I got:

prometheus_auth: "{{ env.JWT_TOKEN }}" alertmanager_auth: "{{ env.JWT_TOKEN }}"

With this template all works like a charm.

@aantn
Copy link
Collaborator

aantn commented Sep 29, 2024

Thanks, if I understand correctly we need to update the auth section on this page regarding Thanos. Is that correct?

@pavangudiwada
Copy link
Contributor

@smoug25

globalConfig:
prometheus_auth: Bearer <YOUR TOKEN> # Replace <YOUR TOKEN> with your actual token or use any other auth header as needed
alertmanager_auth: Basic <USER:PASSWORD base64-encoded> # Replace <USER:PASSWORD base64-encoded> with your actual credentials, base64-encoded, or use any other auth header as needed

This ⬆️ is the current config, and below is your suggestion

prometheus_auth: "{{ env.JWT_TOKEN }}" alertmanager_auth: "{{ env.JWT_TOKEN }}" instead of prometheus_auth: "Bearer {{ env.JWT_TOKEN }}" alertmanager_auth: "Bearer {{ env.JWT_TOKEN }}"

Should the alertmanager_auth secret contain " Basic <USER:PASSWORD base64-encoded> "? Can you please clarify

@smoug25
Copy link
Author

smoug25 commented Oct 23, 2024

@pavangudiwada Hi!

In case if user stores token in kubernetes secrets then secret must contain all auth header value " Basic <USER:PASSWORD base64-encoded> ".

If use none secure way then this

globalConfig: prometheus_auth: Bearer <YOUR TOKEN> # Replace <YOUR TOKEN> with your actual token or use any other auth header as needed alertmanager_auth: Basic <USER:PASSWORD base64-encoded> # Replace <USER:PASSWORD base64-encoded> with your actual credentials, base64-encoded, or use any other auth header as needed

should works as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants