Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable CiliumNetworkPolicy between loki components #3189

Closed
Tracked by #2944
QuentinBisson opened this issue Jan 29, 2024 · 16 comments
Closed
Tracked by #2944

Enable CiliumNetworkPolicy between loki components #3189

QuentinBisson opened this issue Jan 29, 2024 · 16 comments
Assignees
Labels

Comments

@QuentinBisson
Copy link

QuentinBisson commented Jan 29, 2024

Due to a change in onprem MCs, we need to enable cilium network policy between all loki components, otherwise they cannot communicate with each other

@QuantumEnigmaa
Copy link

QuantumEnigmaa commented Jan 30, 2024

I will enable the cilium netpol from upstream. What are the affected MCs ?
I suppose :

  • bamboo
  • gcapeverde
  • gerbil
  • leopard

@QuentinBisson
Copy link
Author

@QuantumEnigmaa can you link the upstream issue here ?

@QuantumEnigmaa
Copy link

Sure ! There it is : grafana/loki#11838

@QuentinBisson
Copy link
Author

Getting released into giantswarm/loki-app#284

@QuentinBisson
Copy link
Author

This is not solved:

For backend pods we have this:

evel=info ts=2024-02-14T14:20:13.880016205Z caller=reporter.go:305 msg="failed to send usage report" retries=0 err="Post \"https://stats.grafana.org/loki-usage-report\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
level=info ts=2024-02-14T14:20:19.915226327Z caller=reporter.go:305 msg="failed to send usage report" retries=1 err="Post \"https://stats.grafana.org/loki-usage-report\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
level=info ts=2024-02-14T14:20:27.010931417Z caller=reporter.go:305 msg="failed to send usage report" retries=2 err="Post \"https://stats.grafana.org/loki-usage-report\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
level=info ts=2024-02-14T14:20:38.67392354Z caller=reporter.go:305 msg="failed to send usage report" retries=3 err="Post \"https://stats.grafana.org/loki-usage-report\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
level=info ts=2024-02-14T14:20:54.911201185Z caller=reporter.go:305 msg="failed to send usage report" retries=4 err="Post \"https://stats.grafana.org/loki-usage-report\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
level=info ts=2024-02-14T14:20:54.911313916Z caller=reporter.go:281 msg="failed to report usage" err="5 errors: Post \"https://stats.grafana.org/loki-usage-report\": context deadline exceeded (Client.Timeout exceeded while awaiting headers); Post \"https://stats.grafana.org/loki-usage-report\": context deadline exceeded (Client.Timeout exceeded while awaiting headers); Post \"https://stats.grafana.org/loki-usage-report\": context deadline exceeded (Client.Timeout exceeded while awaiting headers); Post \"https://stats.grafana.org/loki-usage-report\": context deadline exceeded (Client.Timeout exceeded while awaiting headers); Post \"https://stats.grafana.org/loki-usage-report\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
level=warn ts=2024-02-14T14:21:13.248449982Z caller=tcp_transport.go:437 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.244.4.99:7946 err="dial tcp 10.244.4.99:7946: i/o timeout"
level=info ts=2024-02-14T14:21:13.87920526Z caller=reporter.go:305 msg="failed to send usage report" retries=0 err="Post \"https://stats.grafana.org/loki-usage-report\": proxyconnect tcp: dial tcp 10.205.105.253:3128: i/o timeout (Client.Timeout exceeded while awaiting headers)"
level=warn ts=2024-02-14T14:21:15.744060127Z caller=tcp_transport.go:437 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.244.4.99:7946 err="dial tcp 10.244.4.99:7946: i/o timeout"
level=warn ts=2024-02-14T14:21:17.849164994Z caller=tcp_transport.go:437 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.244.4.99:7946 err="dial tcp 10.244.4.99:7946: i/o timeout"

Not sure what happens with the gateways:

> k logs -n loki loki-gateway-85d6b554b6-4qjtx -c nginx
/docker-entrypoint.sh: No files found in /docker-entrypoint.d/, skipping configuration
> k logs -n loki loki-gateway-85d6b554b6-pp4jr
Defaulted container "nginx" out of: nginx, dnsmasq
/docker-entrypoint.sh: No files found in /docker-entrypoint.d/, skipping configuration

Loki read pods:

evel=info ts=2024-02-14T14:22:50.419026941Z caller=frontend.go:316 msg="not ready: number of schedulers this worker is connected to is 0"
level=info ts=2024-02-14T14:22:55.164117669Z caller=frontend.go:316 msg="not ready: number of schedulers this worker is connected to is 0"
level=warn ts=2024-02-14T14:22:59.413270393Z caller=dns_resolver.go:225 msg="failed DNS A record lookup" err="lookup query-scheduler-discovery.loki.svc.cluster.local. on 172.31.0.10:53: read udp 10.244.4.245:41989->10.244.2.82:1053: i/o timeout"
level=info ts=2024-02-14T14:23:00.419440574Z caller=frontend.go:316 msg="not ready: number of schedulers this worker is connected to is 0"
level=warn ts=2024-02-14T14:23:04.373897014Z caller=dns_resolver.go:225 msg="failed DNS A record lookup" err="lookup query-scheduler-discovery.loki.svc.cluster.local. on 172.31.0.10:53: read udp 10.244.4.245:54379->10.244.0.159:1053: i/o timeout"
level=info ts=2024-02-14T14:23:10.419347589Z caller=frontend.go:316 msg="not ready: number of schedulers this worker is connected to is 0"
level=warn ts=2024-02-14T14:23:12.415151604Z caller=dns_resolver.go:225 msg="failed DNS A record lookup" err="lookup query-scheduler-discovery.loki.svc.cluster.local. on 172.31.0.10:53: read udp 10.244.4.245:54659->10.244.2.82:1053: i/o timeout"
level=info ts=2024-02-14T14:23:20.419586009Z caller=frontend.go:316 msg="not ready: number of schedulers this worker is connected to is 0"
level=warn ts=2024-02-14T14:23:24.37578316Z caller=dns_resolver.go:225 msg="failed DNS A record lookup" err="lookup query-scheduler-discovery.loki.svc.cluster.local. on 172.31.0.10:53: read udp 10.244.4.245:39492->10.244.0.159:1053: i/o timeout"
level=warn ts=2024-02-14T14:23:25.417421171Z caller=dns_resolver.go:225 msg="failed DNS A record lookup" err="lookup query-scheduler-discovery.loki.svc.cluster.local. on 172.31.0.10:53: read udp 10.244.4.245:35322->10.244.2.189:1053: i/o timeout"
level=info ts=2024-02-14T14:23:30.419334606Z caller=frontend.go:316 msg="not ready: number of schedulers this worker is connected to is 0"
level=warn ts=2024-02-14T14:23:38.419479008Z caller=dns_resolver.go:225 msg="failed DNS A record lookup" err="lookup query-scheduler-discovery.loki.svc.cluster.local. on 172.31.0.10:53: read udp 10.244.4.245:57308->10.244.2.82:1053: i/o timeout"
level=info ts=2024-02-14T14:23:40.419139546Z caller=frontend.go:316 msg="not ready: number of schedulers this worker is connected to is 0"
level=warn ts=2024-02-14T14:23:44.3777889Z caller=dns_resolver.go:225 msg="failed DNS A record lookup" err="lookup query-scheduler-discovery.loki.svc.cluster.local. on 172.31.0.10:53: read udp 10.244.4.245:42681->10.244.2.189:1053: i/o timeout"
level=info ts=2024-02-14T14:23:50.419512862Z caller=frontend.go:316 msg="not ready: number of schedulers this worker is connected to is 0"
level=warn ts=2024-02-14T14:23:51.421193603Z caller=dns_resolver.go:225 msg="failed DNS A record lookup" err="lookup query-scheduler-discovery.loki.svc.cluster.local. on 172.31.0.10:53: read udp 10.244.4.245:57727->10.244.0.159:1053: i/o timeout"
level=info ts=2024-02-14T14:24:00.419691585Z caller=frontend.go:316 msg="not ready: number of schedulers this worker is connected to is 0"
level=warn ts=2024-02-14T14:24:04.379832734Z caller=dns_resolver.go:225 msg="failed DNS A record lookup" err="lookup query-scheduler-discovery.loki.svc.cluster.local. on 172.31.0.10:53: read udp 10.244.4.245:60039->10.244.2.189:1053: i/o timeout"
level=warn ts=2024-02-14T14:24:04.423074389Z caller=dns_resolver.go:225 msg="failed DNS A record lookup" err="lookup query-scheduler-discovery.loki.svc.cluster.local. on 172.31.0.10:53: read udp 10.244.4.245:33488->10.244.2.189:1053: i/o timeout"
level=info ts=2024-02-14T14:24:05.163507541Z caller=frontend.go:316 msg="not ready: number of schedulers this worker is connected to is 0"
ts=2024-02-14T14:17:25.908295609Z caller=memberlist_logger.go:74 level=warn msg="Failed to resolve loki-memberlist: lookup loki-memberlist on 172.31.0.10:53: read udp 10.244.2.31:55240->10.244.2.82:1053: i/o timeout"
level=warn ts=2024-02-14T14:17:25.908359195Z caller=memberlist_client.go:595 msg="joining memberlist cluster: failed to reach any nodes" retries=5 err="1 error occurred:\n\t* Failed to resolve loki-memberlist: lookup loki-memberlist on 172.31.0.10:53: read udp 10.244.2.31:55240->10.244.2.82:1053: i/o timeout\n\n"

I think a few of those issues are related to coredns not being accessible

@QuentinBisson
Copy link
Author

Sadly I cannot open hubble on gerbil

@QuentinBisson
Copy link
Author

Here is the list of blocked connections:

b 15 10:53:22.837: kube-system/prometheus-prometheus-agent-0:49430 (ID:2328) <> loki/loki-backend-1:3100 (ID:1697) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:25.796: kube-system/prometheus-prometheus-agent-0:34740 (ID:2328) <> loki/loki-write-1:3100 (ID:23001) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:26.325: kube-system/prometheus-prometheus-agent-0:41304 (ID:2328) <> loki/loki-write-1:3100 (ID:23001) Policy denied DROPPED (TCP Flags: ACK)
Feb 15 10:53:26.805: kube-system/prometheus-prometheus-agent-0:34740 (ID:2328) <> loki/loki-write-1:3100 (ID:23001) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:28.821: kube-system/prometheus-prometheus-agent-0:34740 (ID:2328) <> loki/loki-write-1:3100 (ID:23001) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:30.598: kube-system/prometheus-prometheus-agent-0:51038 (ID:2328) <> loki/loki-backend-1:3100 (ID:1697) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:30.932: kube-system/prometheus-prometheus-agent-0:35740 (ID:2328) <> loki/loki-backend-1:3100 (ID:1697) Policy denied DROPPED (TCP Flags: ACK)
Feb 15 10:53:31.605: kube-system/prometheus-prometheus-agent-0:51038 (ID:2328) <> loki/loki-backend-1:3100 (ID:1697) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:33.077: kube-system/prometheus-prometheus-agent-0:34740 (ID:2328) <> loki/loki-write-1:3100 (ID:23001) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:33.621: kube-system/prometheus-prometheus-agent-0:51038 (ID:2328) <> loki/loki-backend-1:3100 (ID:1697) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:35.637: kube-system/prometheus-prometheus-agent-0:41304 (ID:2328) <> loki/loki-write-1:3100 (ID:23001) Policy denied DROPPED (TCP Flags: ACK, FIN, PSH)
Feb 15 10:53:36.117: kube-system/prometheus-prometheus-agent-0:42396 (ID:2328) <> loki/loki-read-57fb9575dd-22nxt:3100 (ID:10532) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:36.898: kube-system/ingress-nginx-controller-75b9d479b4-x24br:34170 (ID:23796) <> loki/loki-gateway-85d6b554b6-pp4jr:8080 (ID:34283) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:36.945: kube-system/ingress-nginx-controller-75b9d479b4-8rwjd:45644 (ID:23796) <> loki/loki-gateway-85d6b554b6-pp4jr:8080 (ID:34283) Policy denied DROPPED (TCP Flags: ACK)
Feb 15 10:53:37.153: kube-system/ingress-nginx-controller-75b9d479b4-8rwjd:45644 (ID:23796) <> loki/loki-gateway-85d6b554b6-pp4jr:8080 (ID:34283) Policy denied DROPPED (TCP Flags: ACK)
Feb 15 10:53:37.361: kube-system/ingress-nginx-controller-75b9d479b4-8rwjd:45644 (ID:23796) <> loki/loki-gateway-85d6b554b6-pp4jr:8080 (ID:34283) Policy denied DROPPED (TCP Flags: ACK)
Feb 15 10:53:37.482: kube-system/ingress-nginx-controller-75b9d479b4-x24br:60926 (ID:23796) <> loki/loki-gateway-85d6b554b6-4qjtx:8080 (ID:34283) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:37.685: kube-system/prometheus-prometheus-agent-0:57718 (ID:2328) <> loki/loki-backend-0:3100 (ID:10253) Policy denied DROPPED (TCP Flags: ACK, FIN, PSH)
Feb 15 10:53:37.685: kube-system/prometheus-prometheus-agent-0:51038 (ID:2328) <> loki/loki-backend-1:3100 (ID:1697) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:37.738: kube-system/ingress-nginx-controller-75b9d479b4-x24br:44798 (ID:23796) <> loki/loki-gateway-85d6b554b6-4qjtx:8080 (ID:34283) Policy denied DROPPED (TCP Flags: ACK, FIN)
Feb 15 10:53:37.781: kube-system/ingress-nginx-controller-75b9d479b4-8rwjd:45644 (ID:23796) <> loki/loki-gateway-85d6b554b6-pp4jr:8080 (ID:34283) Policy denied DROPPED (TCP Flags: ACK)
Feb 15 10:53:37.794: kube-system/ingress-nginx-controller-75b9d479b4-x24br:60928 (ID:23796) <> loki/loki-gateway-85d6b554b6-4qjtx:8080 (ID:34283) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:37.899: kube-system/ingress-nginx-controller-75b9d479b4-x24br:34170 (ID:23796) <> loki/loki-gateway-85d6b554b6-pp4jr:8080 (ID:34283) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:38.587: kube-system/ingress-nginx-controller-75b9d479b4-8rwjd:47984 (ID:23796) <> loki/loki-gateway-85d6b554b6-pp4jr:8080 (ID:34283) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:38.613: kube-system/ingress-nginx-controller-75b9d479b4-8rwjd:45644 (ID:23796) <> loki/loki-gateway-85d6b554b6-pp4jr:8080 (ID:34283) Policy denied DROPPED (TCP Flags: ACK)
Feb 15 10:53:38.826: kube-system/ingress-nginx-controller-75b9d479b4-x24br:60928 (ID:23796) <> loki/loki-gateway-85d6b554b6-4qjtx:8080 (ID:34283) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:39.274: kube-system/ingress-nginx-controller-75b9d479b4-x24br:44798 (ID:23796) <> loki/loki-gateway-85d6b554b6-4qjtx:8080 (ID:34283) Policy denied DROPPED (TCP Flags: ACK)
Feb 15 10:53:39.498: kube-system/ingress-nginx-controller-75b9d479b4-x24br:60926 (ID:23796) <> loki/loki-gateway-85d6b554b6-4qjtx:8080 (ID:34283) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:39.594: kube-system/ingress-nginx-controller-75b9d479b4-8rwjd:47984 (ID:23796) <> loki/loki-gateway-85d6b554b6-pp4jr:8080 (ID:34283) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:39.733: kube-system/prometheus-prometheus-agent-0:54214 (ID:2328) <> loki/loki-read-57fb9575dd-n6946:3100 (ID:10532) Policy denied DROPPED (TCP Flags: ACK, FIN, PSH)
Feb 15 10:53:39.733: kube-system/prometheus-prometheus-agent-0:35740 (ID:2328) <> loki/loki-backend-1:3100 (ID:1697) Policy denied DROPPED (TCP Flags: ACK, FIN, PSH)
Feb 15 10:53:39.914: kube-system/ingress-nginx-controller-75b9d479b4-x24br:34170 (ID:23796) <> loki/loki-gateway-85d6b554b6-pp4jr:8080 (ID:34283) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:40.245: kube-system/prometheus-prometheus-agent-0:42396 (ID:2328) <> loki/loki-read-57fb9575dd-22nxt:3100 (ID:10532) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:40.277: kube-system/ingress-nginx-controller-75b9d479b4-8rwjd:45644 (ID:23796) <> loki/loki-gateway-85d6b554b6-pp4jr:8080 (ID:34283) Policy denied DROPPED (TCP Flags: ACK)
Feb 15 10:53:40.795: kube-system/prometheus-prometheus-agent-0:53078 (ID:2328) <> loki/loki-write-1:3100 (ID:23001) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:40.842: kube-system/ingress-nginx-controller-75b9d479b4-x24br:60928 (ID:23796) <> loki/loki-gateway-85d6b554b6-4qjtx:8080 (ID:34283) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:40.915: kube-system/prometheus-prometheus-agent-0:57314 (ID:2328) <> loki/loki-write-0:3100 (ID:1360) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:41.428: kube-system/prometheus-prometheus-agent-0:41304 (ID:2328) <> loki/loki-write-1:3100 (ID:23001) Policy denied DROPPED (TCP Flags: ACK)
Feb 15 10:53:41.475: kube-system/ingress-nginx-controller-75b9d479b4-x24br:41364 (ID:23796) <> loki/loki-gateway-85d6b554b6-pp4jr:8080 (ID:34283) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:41.578: kube-system/prometheus-prometheus-agent-0:42348 (ID:2328) <> loki/loki-write-0:3100 (ID:1360) Policy denied DROPPED (TCP Flags: ACK)
Feb 15 10:53:41.618: kube-system/ingress-nginx-controller-75b9d479b4-8rwjd:47984 (ID:23796) <> loki/loki-gateway-85d6b554b6-pp4jr:8080 (ID:34283) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:41.813: kube-system/prometheus-prometheus-agent-0:53078 (ID:2328) <> loki/loki-write-1:3100 (ID:23001) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:41.897: kube-system/ingress-nginx-controller-75b9d479b4-x24br:45944 (ID:23796) <> loki/loki-gateway-85d6b554b6-4qjtx:8080 (ID:34283) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:41.941: kube-system/prometheus-prometheus-agent-0:57314 (ID:2328) <> loki/loki-write-0:3100 (ID:1360) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:41.987: kube-system/prometheus-prometheus-agent-0:44280 (ID:2328) <> loki/loki-backend-0:3100 (ID:10253) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:42.506: kube-system/ingress-nginx-controller-75b9d479b4-x24br:41364 (ID:23796) <> loki/loki-gateway-85d6b554b6-pp4jr:8080 (ID:34283) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:42.602: kube-system/prometheus-prometheus-agent-0:57718 (ID:2328) <> loki/loki-backend-0:3100 (ID:10253) Policy denied DROPPED (TCP Flags: ACK)
Feb 15 10:53:42.793: kube-system/ingress-nginx-controller-75b9d479b4-x24br:41378 (ID:23796) <> loki/loki-gateway-85d6b554b6-pp4jr:8080 (ID:34283) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:42.922: kube-system/ingress-nginx-controller-75b9d479b4-x24br:45944 (ID:23796) <> loki/loki-gateway-85d6b554b6-4qjtx:8080 (ID:34283) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:42.997: kube-system/prometheus-prometheus-agent-0:44280 (ID:2328) <> loki/loki-backend-0:3100 (ID:10253) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:43.573: kube-system/ingress-nginx-controller-75b9d479b4-8rwjd:45644 (ID:23796) <> loki/loki-gateway-85d6b554b6-pp4jr:8080 (ID:34283) Policy denied DROPPED (TCP Flags: ACK)
Feb 15 10:53:43.588: kube-system/ingress-nginx-controller-75b9d479b4-8rwjd:55430 (ID:23796) <> loki/loki-gateway-85d6b554b6-4qjtx:8080 (ID:34283) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:43.818: kube-system/ingress-nginx-controller-75b9d479b4-x24br:41378 (ID:23796) <> loki/loki-gateway-85d6b554b6-pp4jr:8080 (ID:34283) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:43.829: kube-system/prometheus-prometheus-agent-0:53078 (ID:2328) <> loki/loki-write-1:3100 (ID:23001) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:43.833: kube-system/prometheus-prometheus-agent-0:44732 (ID:2328) <> loki/loki-read-57fb9575dd-22nxt:3100 (ID:10532) Policy denied DROPPED (TCP Flags: ACK, FIN, PSH)
Feb 15 10:53:43.943: kube-system/prometheus-prometheus-agent-0:56708 (ID:2328) <> loki/loki-read-57fb9575dd-n6946:3100 (ID:10532) Policy denied DROPPED (TCP Flags: SYN)
Feb 15 10:53:43.957: kube-system/prometheus-prometheus-agent-0:57314 (ID:2328) <> loki/loki-write-0:3100 (ID:1360) Policy denied DROPPED (TCP Flags: SYN)

@QuantumEnigmaa
Copy link

I've done some manual changes to the CNPs (which I'll push upstream) and it looks better for the write pods ;

level=info ts=2024-02-15T13:17:44.87193431Z caller=checkpoint.go:611 msg="starting checkpoint"                                                                                                                                                    
level=info ts=2024-02-15T13:17:44.872294014Z caller=checkpoint.go:336 msg="attempting checkpoint for" dir=/var/loki/wal/checkpoint.004935                                                                                                         
level=info ts=2024-02-15T13:17:44.875514858Z caller=checkpoint.go:498 msg="atomic checkpoint finished" old=/var/loki/wal/checkpoint.004935.tmp new=/var/loki/wal/checkpoint.004935                                                                
level=info ts=2024-02-15T13:18:04.863673219Z caller=table_manager.go:136 index-store=boltdb-shipper-2023-01-24 msg="uploading tables"                                                                                                             
level=info ts=2024-02-15T13:18:04.86489981Z caller=table_manager.go:171 index-store=boltdb-shipper-2023-01-24 msg="handing over indexes to shipper"                                                                                               
level=info ts=2024-02-15T13:18:04.864944863Z caller=table_manager.go:136 index-store=tsdb-2024-01-02 msg="uploading tables"                                                                                                                       
level=info ts=2024-02-15T13:19:04.863795698Z caller=table_manager.go:136 index-store=boltdb-shipper-2023-01-24 msg="uploading tables"                                                                                                             
level=info ts=2024-02-15T13:19:04.864802049Z caller=table_manager.go:171 index-store=boltdb-shipper-2023-01-24 msg="handing over indexes to shipper"                                                                                              
level=info ts=2024-02-15T13:19:04.864838685Z caller=table_manager.go:136 index-store=tsdb-2024-01-02 msg="uploading tables"

However backend pods are not logging anything useful. Gateway pods are still having troubles :

2024/02/15 13:19:52 [error] 9#9: *8564 loki-multi-tenant-proxy.loki.svc.cluster.local could not be resolved (3: Host not found), client: 10.244.2.221, server: , request: "POST /loki/api/v1/push HTTP/1.1", host: "loki.gerbil.test.gigantic.io"
10.244.2.221 - gerbil [15/Feb/2024:13:19:52 +0000]  502 "POST /loki/api/v1/push HTTP/1.1" 157 "-" "GrafanaAgent/v0.37.2" "10.244.5.156"
2024/02/15 13:19:58 [error] 9#9: *8562 loki-multi-tenant-proxy.loki.svc.cluster.local could not be resolved (3: Host not found), client: 10.244.2.221, server: , request: "POST /loki/api/v1/push HTTP/1.1", host: "loki.gerbil.test.gigantic.io"
10.244.2.221 - gerbil [15/Feb/2024:13:19:58 +0000]  502 "POST /loki/api/v1/push HTTP/1.1" 157 "-" "GrafanaAgent/v0.37.2" "10.244.2.224"
2024/02/15 13:20:09 [error] 9#9: *8564 loki-multi-tenant-proxy.loki.svc.cluster.local could not be resolved (3: Host not found), client: 10.244.2.221, server: , request: "POST /loki/api/v1/push HTTP/1.1", host: "loki.gerbil.test.gigantic.io"
10.244.2.221 - gerbil [15/Feb/2024:13:20:09 +0000]  502 "POST /loki/api/v1/push HTTP/1.1" 157 "-" "GrafanaAgent/v0.37.2" "10.244.5.156"
2024/02/15 13:20:40 [error] 12#12: *8570 loki-multi-tenant-proxy.loki.svc.cluster.local could not be resolved (3: Host not found), client: 10.244.2.94, server: , request: "POST /loki/api/v1/push HTTP/1.1", host: "loki.gerbil.test.gigantic.io"
10.244.2.94 - gerbil [15/Feb/2024:13:20:40 +0000]  502 "POST /loki/api/v1/push HTTP/1.1" 157 "-" "promtail/2.8.4" "10.244.2.194"

@QuantumEnigmaa
Copy link

Upstream issue to fix the CNPs : grafana/loki#11963

Adding a plain netpol to allow loki pods access to coredns helped having the gateway pods work :

10.244.2.94 - gerbil [15/Feb/2024:14:47:59 +0000]  499 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.8.4" "10.244.0.229"
10.244.2.94 - gerbil [15/Feb/2024:14:48:47 +0000]  499 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.8.4" "10.244.3.158"
10.244.2.221 - gerbil [15/Feb/2024:14:48:59 +0000]  499 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "GrafanaAgent/v0.37.2" "10.244.5.156"
10.244.2.94 - gerbil [15/Feb/2024:14:49:07 +0000]  499 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "GrafanaAgent/v0.37.2" "10.244.2.224"
10.244.2.221 - gerbil [15/Feb/2024:14:49:09 +0000]  499 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.8.4" "10.244.3.158"
10.244.2.94 - gerbil [15/Feb/2024:14:49:39 +0000]  499 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "GrafanaAgent/v0.37.2" "10.244.2.224"
10.244.2.221 - gerbil [15/Feb/2024:14:49:41 +0000]  499 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.8.4" "10.244.3.158"

@QuentinBisson
Copy link
Author

Is there a way to fix all CNPs at once and have it in our app in the meantime?

@QuentinBisson
Copy link
Author

Or maybe we can disable cnps for loki until this is merged?

@QuantumEnigmaa
Copy link

Lucky this time : already merged upstream 😎

@QuantumEnigmaa
Copy link

After adding additional CNPs for both write and backend pods to allow them egress access to the "world" entity, it finally works

@QuantumEnigmaa
Copy link

Now loki have upstream CNPs + custom ones (coredsn, egress to world for write, egress to world + kube-apiserver for backend) enabled by default for all capi installations.
This should solve all CNPs related issues for loki components

@github-project-automation github-project-automation bot moved this from Inbox 📥 to Done ✅ in Roadmap Feb 22, 2024
@QuentinBisson
Copy link
Author

Can we create a new one to push them upstream?

@QuantumEnigmaa
Copy link

https://github.com/giantswarm/giantswarm/issues/29990

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

No branches or pull requests

3 participants