Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[telemetry] telemetry.sh script fails to handle missing parameter in config DB and docker does not start #8959

Open
ayurkiv-nvda opened this issue Oct 12, 2021 · 6 comments
Assignees
Labels
Triaged this issue has been triaged

Comments

@ayurkiv-nvda
Copy link
Contributor

ayurkiv-nvda commented Oct 12, 2021

Description

telemetry_vars.j2 file use config_db.json to generate set of parameters.
It uses following configs:

"TELEMETRY": {
    "certs": {
    "ca_crt": "/etc/sonic/telemetry/dsmsroot.cer",
    "server_crt": "/etc/sonic/telemetry/streamingtelemetryserver.cer",
    "server_key": "/etc/sonic/telemetry/streamingtelemetryserver.key"
     },
    "gnmi": {
    "client_auth": "true",
    "log_level": "2",
    "port": "50051"
    }
},

This config is converted by telemetry.sh:L13 :

"certs": {"ca_crt": "/etc/sonic/telemetry/dsmsroot.cer", "server_crt": "/etc/sonic/telemetry/streamingtelemetryserver.cer", "server_key": "/etc/sonic/telemetry/streamingtelemetryserver.key"},
"gnmi" : {"client_auth": "true", "log_level": "2", "port": "50051"},
"x509" : ""

But if log_level is not provided in condig_db.json than telemetry.sh can't handle it properly
for example:

{
 "certs": "",
 "gnmi" : {"port": "8080"}, 
 "x509" : ""
} 

LOG_LEVEL will be equal to "null" telemetry.sh:L65
As a result "null" will be passed as a parameter: -v null to telemetry binary inside docker and it will cause fail of docker start

Steps to reproduce the issue:

  1. Load 202106 image built on top 7aa9fde or newer
  2. remove "log_level": "2" from config db if its present
  3. run config reload -y.
  4. run docker ps -a - telemetry will have status "Exited (0) About a minute ago"

Describe the results you received:

admin@sonic:~$ sudo grep -A 5 "telemetry invalid value" /var/log/telemetry.log
Oct 12 13:09:23.446046 ptr-sonic-n1-s1 INFO telemetry#/supervisord: telemetry invalid value "null" for flag -v: strconv.Atoi: parsing "null": invalid syntax
Oct 12 13:09:23.446143 ptr-sonic-n1-s1 INFO telemetry#/supervisord: telemetry Usage of /usr/sbin/telemetry:
Oct 12 13:09:23.446750 ptr-sonic-n1-s1 INFO telemetry#/supervisord: telemetry   -allow_no_client_auth
Oct 12 13:09:23.447017 ptr-sonic-n1-s1 INFO telemetry#/supervisord: telemetry     #011When set, telemetry server will request but not require a client certificate.
Oct 12 13:09:23.447120 ptr-sonic-n1-s1 INFO telemetry#/supervisord: telemetry   -alsologtostderr
admin@sonic:~$ docker ps -a
CONTAINER ID        IMAGE                                COMMAND                  CREATED             STATUS                     PORTS               NAMES
52f45cdaa29f        docker-sflow:latest                  "/usr/local/bin/supe…"   9 minutes ago       Up 9 minutes                                   sflow
35a3345daa84        docker-nat:latest                    "/usr/local/bin/supe…"   9 minutes ago       Up 9 minutes                                   nat
7f989da64f04        docker-sonic-mgmt-framework:latest   "/usr/local/bin/supe…"   2 weeks ago         Up 9 minutes                                   mgmt-framework
67589a0b619c        docker-sonic-telemetry:latest        "/usr/local/bin/supe…"   2 weeks ago         Exited (0) 5 minutes ago                       telemetry
59d8c54d96be        docker-snmp:latest                   "/usr/local/bin/supe…"   2 weeks ago         Up 8 minutes                                   snmp
f97cb689f2e2        docker-router-advertiser:latest      "/usr/bin/docker-ini…"   2 weeks ago         Up 9 minutes                                   radv
195019ff1d93        docker-lldp:latest                   "/usr/bin/docker-lld…"   2 weeks ago         Up 9 minutes                                   lldp
c59085bdf0de        docker-platform-monitor:latest       "/usr/bin/docker_ini…"   2 weeks ago         Up 9 minutes                                   pmon
5dca3589adc5        docker-syncd-mlnx:latest             "/usr/local/bin/supe…"   2 weeks ago         Up 9 minutes                                   syncd
ef2d9a76f77e        docker-teamd:latest                  "/usr/local/bin/supe…"   2 weeks ago         Up 9 minutes                                   teamd
d9db486987e2        docker-orchagent:latest              "/usr/bin/docker-ini…"   2 weeks ago         Up 9 minutes                                   swss
f0590580c032        docker-fpm-frr:latest                "/usr/bin/docker_ini…"   2 weeks ago         Up 9 minutes                                   bgp
d5d7ccef0e78        docker-database:latest               "/usr/local/bin/dock…"   2 weeks ago         Up 2 weeks                                     database

Describe the results you expected:

Expect telemetry docker is up since missing just "log_level" configuration doesn't look so critical to prevent telemetry from running

Output of show version:

SONiC Software Version: SONiC.202106.17-4536f35f2_Internal
Distribution: Debian 10.10
Kernel: 4.19.0-12-2-amd64
Build commit: 4536f35f2
Build date: Wed Sep 22 08:24:08 UTC 2021
Built by: sw-r2d2-bot@r-build-sonic-ci02-244

Platform: x86_64-mlnx_msn2700-r0
HwSKU: Mellanox-SN2700
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1938X21101
Model Number: MSN2700-CS2F
Hardware Revision: A2
Uptime: 12:17:30 up 18 days, 21:27,  2 users,  load average: 1.28, 1.35, 1.25

Docker images:
REPOSITORY                    TAG                            IMAGE ID            SIZE
docker-syncd-mlnx             202106.17-4536f35f2_Internal   0c04f10f2c2c        997MB
docker-syncd-mlnx             latest                         0c04f10f2c2c        997MB
docker-platform-monitor       202106.17-4536f35f2_Internal   227fb1ee97f4        746MB
docker-platform-monitor       latest                         227fb1ee97f4        746MB
docker-dhcp-relay             latest                         2d2618c074d5        420MB
docker-snmp                   202106.17-4536f35f2_Internal   bd7b6e158f3e        455MB
docker-snmp                   latest                         bd7b6e158f3e        455MB
docker-teamd                  202106.17-4536f35f2_Internal   e4d76906805a        425MB
docker-teamd                  latest                         e4d76906805a        425MB
docker-lldp                   202106.17-4536f35f2_Internal   ac6ef41594e4        453MB
docker-lldp                   latest                         ac6ef41594e4        453MB
docker-database               202106.17-4536f35f2_Internal   0c00992e77ea        413MB
docker-database               latest                         0c00992e77ea        413MB
docker-router-advertiser      202106.17-4536f35f2_Internal   5b4d531fb734        413MB
docker-router-advertiser      latest                         5b4d531fb734        413MB
docker-orchagent              202106.17-4536f35f2_Internal   95b72c53b4ee        443MB
docker-orchagent              latest                         95b72c53b4ee        443MB
docker-nat                    202106.17-4536f35f2_Internal   95d2ccfbf36c        428MB
docker-nat                    latest                         95d2ccfbf36c        428MB
docker-macsec                 202106.17-4536f35f2_Internal   2d103b4b2d0f        428MB
docker-macsec                 latest                         2d103b4b2d0f        428MB
docker-sonic-telemetry        202106.17-4536f35f2_Internal   563dc1a9b412        502MB
docker-sonic-telemetry        latest                         563dc1a9b412        502MB
docker-sonic-mgmt-framework   202106.17-4536f35f2_Internal   58b3da73944d        570MB
docker-sonic-mgmt-framework   latest                         58b3da73944d        570MB
docker-fpm-frr                202106.17-4536f35f2_Internal   7258659b2c3d        443MB
docker-fpm-frr                latest                         7258659b2c3d        443MB
docker-sflow                  202106.17-4536f35f2_Internal   79306ace497b        426MB
docker-sflow                  latest                         79306ace497b        426MB

Output of show techsupport:

Additional information you deem important (e.g. issue happens only occasionally):

@zhangyanzhao zhangyanzhao added the Triaged this issue has been triaged label Oct 13, 2021
@zhangyanzhao
Copy link
Collaborator

Telemetry feature improvement suggestion. Need telemetry owner to chime-in

@zhangyanzhao
Copy link
Collaborator

@yozhao101 is working on this issue.

@dprital
Copy link
Collaborator

dprital commented Feb 8, 2022

@yozhao101 - Can you please update the status here ?

@yozhao101
Copy link
Contributor

@dprital Thanks so much for reporting this issue! we have an open PR (#9600) to address this issue.

@qiluo-msft qiluo-msft assigned zbud-msft and unassigned qiluo-msft and yozhao101 Aug 5, 2022
@liat-grozovik
Copy link
Collaborator

@zbud-msft any update on this issue or the plan to fix it?

@zbud-msft
Copy link
Contributor

Fix is in master, 202211, 202205, 202012: #14303

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

8 participants