Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inputs.cisco_telemetry_mdt giving panic: runtime error: index out of range [0] with length 0 when using with TCP protocol #13789

Closed
Rajat0312 opened this issue Aug 18, 2023 · 20 comments
Labels
bug unexpected problem or unintended behavior waiting for response waiting for response from contributor

Comments

@Rajat0312
Copy link

Rajat0312 commented Aug 18, 2023

Relevant telegraf.conf

apiVersion: v1
data:
  telegraf.conf: |
    [agent]
      collection_jitter = "0s"
      debug = true
      flush_interval = "10s"
      flush_jitter = "0s"
      hostname = "$HOSTNAME"
      interval = "10s"
      logfile = ""
      metric_batch_size = 1000
      metric_buffer_limit = 10000
      omit_hostname = false
      precision = ""
      quiet = false
      round_interval = true
    [[processors.enum]]
       [[processors.enum.mapping]]
        dest = "status_code"
        field = "status"
        [processors.enum.mapping.value_mappings]
            critical = 3
            healthy = 1
            problem = 2


    [[inputs.cisco_telemetry_mdt]]
      transport = "tcp"
      service_address = ":57000" # instruct telegraf to listen on port 57000 for TCP telemetry
      max_msg_size = 2000000000
 
    [[inputs.cisco_telemetry_mdt]]
      transport = "grpc"
      service_address = ":57001" # instruct telegraf to listen on port 57001 for grpc telemetry
      max_msg_size = 2000000000
 

    [[outputs.file]]
      files = ["stdout"] # files to write to, "stdout" is a specially handled file
      data_format = "json"

    [[outputs.kafka]]
      brokers = ["localhost:9092"]
      topic = "test"
      ssl_cert = "/etc/telegraf/kafka.cert"
      insecure_skip_verify = true
      data_format = "json"
      # ssl_ca = "ca.pem"
      # ssl_key = "elvarx1.key"

Logs from Telegraf

2023-08-16T12:48:42Z I! Loading config: /etc/telegraf/telegraf.conf
2023-08-16T12:48:42Z W! DeprecationWarning: Option "ssl_cert" of plugin "outputs.kafka" deprecated since version 1.7.0 and will be removed in 2.0.0: use 'tls_cert' instead
2023-08-16T12:48:42Z E! Unsupported logtarget: stdout, using stderr
2023-08-16T12:48:42Z I! Starting Telegraf 1.27.3
2023-08-16T12:48:42Z I! Available plugins: 237 inputs, 9 aggregators, 28 processors, 23 parsers, 59 outputs, 4 secret-stores
2023-08-16T12:48:42Z I! Loaded inputs: cisco_telemetry_mdt (2x)
2023-08-16T12:48:42Z I! Loaded aggregators:
2023-08-16T12:48:42Z I! Loaded processors: enum
2023-08-16T12:48:42Z I! Loaded secretstores:
2023-08-16T12:48:42Z I! Loaded outputs: file kafka prometheus_client
2023-08-16T12:48:42Z I! Tags enabled: host=telegraf
2023-08-16T12:48:42Z W! Deprecated outputs: 0 and 1 options
2023-08-16T12:48:42Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"telegraf", Flush Interval:10s
2023-08-16T12:48:42Z I! [outputs.prometheus_client] Listening on http://[::]:9091/metrics
panic: runtime error: index out of range [0] with length 0

System info

Telegraf 1.27.3 alpine 3.18.2

Docker

FROM alpine:3.18

RUN echo 'hosts: files dns' >> /etc/nsswitch.conf
RUN apk add --no-cache iputils ca-certificates net-snmp-tools procps lm_sensors tzdata su-exec libcap &&
update-ca-certificates

ENV TELEGRAF_VERSION 1.27.3

RUN ARCH= &&
case "$(apk --print-arch)" in
x86_64) ARCH='amd64';;
aarch64) ARCH='arm64';;
) echo "Unsupported architecture: $(apk --print-arch)"; exit 1;;
esac &&
set -ex &&
mkdir ~/.gnupg;
echo "disable-ipv6" >> ~/.gnupg/dirmngr.conf;
apk add --no-cache --virtual .build-deps wget gnupg tar &&
for key in
9D539D90D3328DC7D6C8D3B9D8FF8E1F7DF8B07E ;
do
gpg --keyserver hkp://keyserver.ubuntu.com --recv-keys "$key" ;
done &&
wget --no-verbose https://dl.influxdata.com/telegraf/releases/telegraf-${TELEGRAF_VERSION}_linux_${ARCH}.tar.gz.asc &&
wget --no-verbose https://dl.influxdata.com/telegraf/releases/telegraf-${TELEGRAF_VERSION}_linux_${ARCH}.tar.gz &&
gpg --batch --verify telegraf-${TELEGRAF_VERSION}linux${ARCH}.tar.gz.asc telegraf-${TELEGRAF_VERSION}linux${ARCH}.tar.gz &&
mkdir -p /usr/src /etc/telegraf &&
tar -C /usr/src -xzf telegraf-${TELEGRAF_VERSION}linux${ARCH}.tar.gz &&
mv /usr/src/telegraf
/etc/telegraf/telegraf.conf /etc/telegraf/ &&
mkdir /etc/telegraf/telegraf.d &&
cp -a /usr/src/telegraf*/usr/bin/telegraf /usr/bin/ &&
gpgconf --kill all &&
rm -rf .tar.gz /usr/src /root/.gnupg &&
apk del .build-deps &&
addgroup -S telegraf &&
adduser -S telegraf -G telegraf &&
chown -R telegraf:telegraf /etc/telegraf

EXPOSE 8125/udp 8092/udp 8094

COPY entrypoint.sh /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
CMD ["telegraf"]

Steps to reproduce

1.start telegraf with tcp and grpc both cisco mdt plugin
2.input data from source -{"fields":{"threshold":346,"speed":327},"name":"cisco","tags":{"host":"telegraf","name":"1","path":"cisco-path","source":"router-1","subscription":"1"},"timestamp":1631191178}
3.it was working in telegraf 1.23.3
...

Expected behavior

It should be running with no error . Right now it gets down

Actual behavior

It gets down with error panic: runtime error: index out of range [0] with length 0

Additional info

It is running in telegraf 1.23.3 but not in latest versions

@Rajat0312 Rajat0312 added the bug unexpected problem or unintended behavior label Aug 18, 2023
@zak-pawel
Copy link
Collaborator

@Rajat0312 are you sure you provided whole telegraf.conf?

I don't see prometheus_client configured in telegraf.conf but (basing on your logs) it was enabed.
Also, did you provide whole output from log?

@Rajat0312
Copy link
Author

[[outputs.prometheus_client]]
listen = ":9091"
# Path to publish the metrics on, defaults to /metrics
path = "/metrics"
# Expiration interval for each metric. 0 == no expiration
expiration_interval = "60s"
# Send string metrics as Prometheus labels.
# Unless set to false all string metrics will be sent as labels.
string_as_label = true

yeah . It was also there and logs I have provided is whole

@zak-pawel
Copy link
Collaborator

I'm not sure if provided telegraf.conf is the same which is used by Telegraf you run.

  1. I see E! Unsupported logtarget: stdout, using stderr in your log which should be logged only when logtarget parameter is set to stdout in agent section. You don't have this parameter configured at all.
  2. debug = true is used in agent section but there are no debug logs in your log file. And they should be.

Can you run cat /etc/telegraf/telegraf.conf inside your container and provide its output?

@Rajat0312
Copy link
Author

/ # cat /etc/telegraf/telegraf.conf
[agent]
collection_jitter = "0s"
debug = false
flush_interval = "10s"
flush_jitter = "0s"
hostname = "$HOSTNAME"
interval = "10s"
logfile = ""
metric_batch_size = 1000
metric_buffer_limit = 10000
omit_hostname = false
precision = ""
quiet = false
round_interval = true
logtarget = "stdout"
[[processors.enum]]
[[processors.enum.mapping]]
dest = "status_code"
field = "status"
[processors.enum.mapping.value_mappings]
critical = 3
healthy = 1
problem = 2

[[inputs.cisco_telemetry_mdt]]
transport = "tcp"
service_address = ":57000" # instruct telegraf to listen on port 57000 for TCP telemetry
max_msg_size = 4294967295
embedded_tags = ["Cisco-IOS-XR-qos-ma-oper:qos/interface-table/interface/input/service-policy-names/service-policy-instance/statistics/class-stats/class-name","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface-xr/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface_xr/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface-xr/interface/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface_xr/interface/encapsulation"]
[[inputs.cisco_telemetry_mdt]]
transport = "grpc"
service_address = ":57001" # instruct telegraf to listen on port 57000 for TCP telemetry
max_msg_size = 4294967295
embedded_tags = ["Cisco-IOS-XR-qos-ma-oper:qos/interface-table/interface/input/service-policy-names/service-policy-instance/statistics/class-stats/class-name","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface-xr/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface_xr/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface-xr/interface/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface_xr/interface/encapsulation"]

[[outputs.file]]
files = ["stdout"] # files to write to, "stdout" is a specially handled file
data_format = "json"

[[outputs.kafka]]
brokers = ["localhost:9092"]
topic = "test"
ssl_cert = "/etc/telegraf/kafka.cert"
insecure_skip_verify = true
data_format = "json"

ssl_ca = "ca.pem"

ssl_key = "elvarx1.key"

[[outputs.prometheus_client]]
listen = ":9091"

Path to publish the metrics on, defaults to /metrics

path = "/metrics"

Expiration interval for each metric. 0 == no expiration

expiration_interval = "60s"

Send string metrics as Prometheus labels.

Unless set to false all string metrics will be sent as labels.

string_as_label = true

@zak-pawel

@zak-pawel
Copy link
Collaborator

Just theoretically (don't have environment to reproduce, and it is hard to say without full stack trace), there can be a problem here:
https://github.com/influxdata/telegraf/blob/master/plugins/inputs/cisco_telemetry_mdt/cisco_telemetry_mdt.go#L416-L432
image
It had been almost fixed in first commit in this PR #12637 but in second commit it has been reverted ;) Problem with not checking length of subfield.Fields before calling subfield.Fields[0]

Or here:
https://github.com/influxdata/telegraf/blob/master/plugins/inputs/cisco_telemetry_mdt/cisco_telemetry_mdt.go#L572-L579
image
Problem with not checking length of field.Fields[0].Fields[0].Fields[0].Fields before calling field.Fields[0].Fields[0].Fields[0].Fields[0][0]

Or here:
https://github.com/influxdata/telegraf/blob/master/plugins/inputs/cisco_telemetry_mdt/cisco_telemetry_mdt.go#L639-L668
image
where a lot of Fields are not checked for length :)

@powersj What do you think?

@powersj
Copy link
Contributor

powersj commented Aug 18, 2023

Hi,

@zak-pawel thanks for taking a look!

panic: runtime error: index out of range [0] with length 0

@Rajat0312 - whenever there is a panic there are additional log messages below it that say where the panic occurred. I would prefer not to guess and see what that message actually said please. It should look something like:

❯ ./telegraf --config config.toml --once
2023-08-18T12:27:03Z I! Loading config: config.toml
2023-08-18T12:27:03Z I! Starting Telegraf 1.28.0-168d9272
2023-08-18T12:27:03Z I! Available plugins: 239 inputs, 9 aggregators, 28 processors, 24 parsers, 59 outputs, 5 secret-stores
2023-08-18T12:27:03Z I! Loaded inputs: cisco_telemetry_mdt
2023-08-18T12:27:03Z I! Loaded aggregators: 
2023-08-18T12:27:03Z I! Loaded processors: 
2023-08-18T12:27:03Z I! Loaded secretstores: 
2023-08-18T12:27:03Z I! Loaded outputs: file
2023-08-18T12:27:03Z I! Tags enabled: 
2023-08-18T12:27:03Z D! [agent] Initializing plugins
2023-08-18T12:27:03Z D! [agent] Connecting outputs
2023-08-18T12:27:03Z D! [agent] Attempting connection to [outputs.file]
2023-08-18T12:27:03Z D! [agent] Successfully connected to outputs.file
2023-08-18T12:27:03Z D! [agent] Starting service inputs
panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
github.com/influxdata/telegraf/plugins/inputs/cisco_telemetry_mdt.(*CiscoTelemetryMDT).Start(0xc000158510?, {0x7f8f7e8?, 0xc001d2cb60?})
	/home/powersj/telegraf/plugins/inputs/cisco_telemetry_mdt/cisco_telemetry_mdt.go:104 +0x12
github.com/influxdata/telegraf/agent.(*Agent).testStartInputs(0xc000158510?, 0xc00100dbc0, {0xc001e30600, 0x1, 0x1?})
	/home/powersj/telegraf/agent/agent.go:446 +0x1d0
github.com/influxdata/telegraf/agent.(*Agent).runOnce(0xc00012a9e0, {0x7fe8c28?, 0xc001e84d70}, 0x0)
	/home/powersj/telegraf/agent/agent.go:1124 +0x3b8
github.com/influxdata/telegraf/agent.(*Agent).Once(0xc00012a9e0, {0x7fe8c28?, 0xc001e84d70?}, 0xc?)
	/home/powersj/telegraf/agent/agent.go:1065 +0x26
main.(*Telegraf).runAgent(0xc001eb2780, {0x7fe8c28, 0xc001e84d70}, 0x10?, 0x40?)
	/home/powersj/telegraf/cmd/telegraf/telegraf.go:346 +0x17c5
main.(*Telegraf).reloadLoop(0xc001eb2780)
	/home/powersj/telegraf/cmd/telegraf/telegraf.go:166 +0x25b
main.(*Telegraf).Run(0x0?)
	/home/powersj/telegraf/cmd/telegraf/telegraf_posix.go:14 +0x52
main.runApp.func1(0xc001e7ef80)
	/home/powersj/telegraf/cmd/telegraf/main.go:246 +0xac9
github.com/urfave/cli/v2.(*Command).Run(0xc001e4ef20, 0xc001e7ef80, {0xc0001ba140, 0x4, 0x4})
	/home/powersj/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:274 +0x998
github.com/urfave/cli/v2.(*App).RunContext(0xc0010fef00, {0x7fe8a30?, 0xc4d9f40}, {0xc0001ba140, 0x4, 0x4})
	/home/powersj/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:332 +0x5b7
github.com/urfave/cli/v2.(*App).Run(...)
	/home/powersj/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:309
main.runApp({0xc0001ba140, 0x4, 0x4}, {0x7f71900?, 0xc0001b4048}, {0x7f8dd40?, 0xc001e30280}, {0x7f8dd68?, 0xc001e4e160}, {0x7fe8918, ...})
	/home/powersj/telegraf/cmd/telegraf/main.go:368 +0xfb5
main.main()
	/home/powersj/telegraf/cmd/telegraf/main.go:378 +0xed

@powersj powersj added the waiting for response waiting for response from contributor label Aug 18, 2023
@Rajat0312
Copy link
Author

Hi @powersj , I am running telegraf via kubernetes and it is the last log before pod gets terminated. Is there any workaround by which I can get these logs?

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Aug 18, 2023
@powersj
Copy link
Contributor

powersj commented Aug 18, 2023

Is there any workaround by which I can get these logs?

Based on your logs it happens right away, which is good.

You can log to a file maybe and see if it collects it
You can run it by hand in a container or in the environment

In any case I really want to see the full trace please.

@powersj powersj added the waiting for response waiting for response from contributor label Aug 18, 2023
@Rajat0312
Copy link
Author

I am trying to get whole logs till can you suggest any workaround to run this new image 1.27.3

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Aug 18, 2023
@powersj powersj added the waiting for response waiting for response from contributor label Aug 18, 2023
@Rajat0312
Copy link
Author

Rajat0312 commented Aug 21, 2023

@powersj @zak-pawel , I have tried to get full stacktrace but its only showing logs till panic runtime error , I have also tried to get logs in a file but I am getting below error in it , please look into this part how I can use this new image for tcp.

2023-08-21T06:01:01Z I! Using config file: /etc/telegraf/telegraf.conf
2023-08-21T06:01:01Z E! [telegraf] Error running agent: Error loading config file /etc/telegraf/telegraf.conf: line 1: configuration specified the fields ["loglevel"], but they weren't used

Config=
apiVersion: v1
data:
telegraf.conf: |
[agent]
logfile = "/path/to/telegraf.log"
loglevel = "error"
collection_jitter = "0s"
debug = true
flush_interval = "10s"
flush_jitter = "0s"
hostname = "$HOSTNAME"
interval = "10s"
metric_batch_size = 1000
metric_buffer_limit = 10000
omit_hostname = false
precision = ""
quiet = false
round_interval = true
[[processors.enum]]
[[processors.enum.mapping]]
dest = "status_code"
field = "status"
[processors.enum.mapping.value_mappings]
critical = 3
healthy = 1
problem = 2

 [[inputs.cisco_telemetry_mdt]]
   transport = "tcp"
   service_address = ":57000" # instruct telegraf to listen on port 57000 for TCP telemetry
   max_msg_size = 4294967295
   embedded_tags = ["Cisco-IOS-XR-qos-ma-oper:qos/interface-table/interface/input/service-policy-names/service-policy-instance/statistics/class-stats/class-name","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface-xr/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface_xr/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface-xr/interface/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface_xr/interface/encapsulation"]
 [[inputs.cisco_telemetry_mdt]]
   transport = "grpc"
   service_address = ":57001" # instruct telegraf to listen on port 57000 for TCP telemetry
   max_msg_size = 4294967295
   embedded_tags = ["Cisco-IOS-XR-qos-ma-oper:qos/interface-table/interface/input/service-policy-names/service-policy-instance/statistics/class-stats/class-name","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface-xr/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface_xr/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface-xr/interface/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface_xr/interface/encapsulation"]

[[outputs.file]]
  files = ["stdout"] # files to write to, "stdout" is a specially handled file
  data_format = "json"

[[outputs.kafka]]
  brokers = ["localhost:9092"]
  topic = "test"
  ssl_cert = "/etc/telegraf/kafka.cert"
  insecure_skip_verify = true
  data_format = "json"
  # ssl_ca = "ca.pem"
  # ssl_key = "elvarx1.key"
[[outputs.prometheus_client]]
  listen = ":9091"
  # Path to publish the metrics on, defaults to /metrics
  path = "/metrics"
  # Expiration interval for each metric. 0 == no expiration
  expiration_interval = "60s"
  # Send string metrics as Prometheus labels.
  # Unless set to false all string metrics will be sent as labels.
  string_as_label = true

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Aug 21, 2023
@powersj
Copy link
Contributor

powersj commented Aug 21, 2023

line 1: configuration specified the fields ["loglevel"], but they weren't used

This is not a valid configuration option. Remove loglevel. Where did you see this suggested or added?

@powersj powersj added the waiting for response waiting for response from contributor label Aug 21, 2023
@Rajat0312
Copy link
Author

@powersj I have also tried without loglevel but no logfile is created. Then I tried to change in source code. I have just added a if condition in this for loop and it’s working now but I need to confirm that is it right way ? Any other functionality will work properly?
image

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Aug 21, 2023
@powersj
Copy link
Contributor

powersj commented Aug 21, 2023

I have also tried without loglevel but no logfile is created.
logfile = "/path/to/telegraf.log"

Does this path exist? If you have the ability to build and run a custom telegraf, then do you have the ability to jump into one of these containers and run telegraf by hand? This should not require building a custom version of telegraf. All we are trying to do is get the complete log message.

I have just added a if condition in this for loop and it’s working now but I need to confirm that is it right way ?

You are guessing as to the cause. It might be the case, but without the full trace we cannot be certain and I would rather fix this with certainty and confidence, rather than play guess and check with you.

@powersj powersj added the waiting for response waiting for response from contributor label Aug 21, 2023
@Rajat0312
Copy link
Author

So, I used different path too like logfile = "/etc/telegraf/telegraf.log" but files are not being created. I also want to check full stack trace. Yah ! I can access telegraf in containers but I can’t do changes there in code.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Aug 21, 2023
@powersj
Copy link
Contributor

powersj commented Aug 21, 2023

I am not asking you to do changes to the code. I am asking you to run telegraf by hand:

telegraf --debug --config <your config>

Can you reproduce the issue that way?

@powersj powersj added the waiting for response waiting for response from contributor label Aug 21, 2023
@Rajat0312
Copy link
Author

Rajat0312 commented Aug 22, 2023

/ # telegraf --debug --config /etc/telegraf/telegraf.conf
2023-08-21T14:12:23Z I! Loading config: /etc/telegraf/telegraf.conf
2023-08-21T14:12:23Z W! DeprecationWarning: Option "ssl_cert" of plugin "outputs.kafka" deprecated since version 1.7.0 and will be removed in 2.0.0: use 'tls_cert' instead
2023-08-21T14:12:24Z E! Unable to open /etc/telegraf/databus/telegraf.log (open /etc/telegraf/databus/telegraf.log: read-only file system), using stderr
2023-08-21T14:12:24Z I! Starting Telegraf 1.27.3
2023-08-21T14:12:24Z I! Available plugins: 237 inputs, 9 aggregators, 28 processors, 23 parsers, 59 outputs, 4 secret-stores
2023-08-21T14:12:24Z I! Loaded inputs: cisco_telemetry_mdt (2x)
2023-08-21T14:12:24Z I! Loaded aggregators:
2023-08-21T14:12:24Z I! Loaded processors: enum
2023-08-21T14:12:24Z I! Loaded secretstores:
2023-08-21T14:12:24Z I! Loaded outputs: file kafka prometheus_client
2023-08-21T14:12:24Z I! Tags enabled: host=telegraf
2023-08-21T14:12:24Z W! Deprecated outputs: 0 and 1 options
2023-08-21T14:12:24Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"telegraf", Flush Interval:10s
2023-08-21T14:12:24Z D! [agent] Initializing plugins
2023-08-21T14:12:24Z D! [agent] Connecting outputs
2023-08-21T14:12:24Z D! [agent] Attempting connection to [outputs.kafka]
2023-08-21T14:12:24Z D! [sarama] Initializing new client
2023-08-21T14:12:24Z D! [sarama] client/metadata fetching metadata for all topics from broker localhost:9092
2023-08-21T14:12:24Z D! [sarama] Connected to broker at localhost:9092 (unregistered)
2023-08-21T14:12:24Z D! [sarama] Successfully initialized new client
2023-08-21T14:12:24Z D! [agent] Successfully connected to outputs.kafka
2023-08-21T14:12:24Z D! [agent] Attempting connection to [outputs.prometheus_client]
2023-08-21T14:12:24Z E! [agent] Failed to connect to [outputs.prometheus_client], retrying in 15s, error was "listen tcp :9091: bind: address already in use"
2023-08-21T14:12:39Z D! [sarama] Producer shutting down.
2023-08-21T14:12:39Z D! [sarama] Closing Client
2023-08-21T14:12:39Z D! [sarama] Closed connection to broker localhost:9092
2023-08-21T14:12:39Z E! [telegraf] Error running agent: connecting output outputs.prometheus_client: error connecting to output "outputs.prometheus_client": listen tcp :9091: bind: address already in use

Inside the kubernetes pod can not able to reproduce that way

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Aug 22, 2023
@Rajat0312
Copy link
Author

2023-08-21T14:12:24Z E! Unable to open /etc/telegraf/databus/telegraf.log (open /etc/telegraf/databus/telegraf.log: read-only file system), using stderr
to solve this error I used tmp/ path . but in that file also no stacktrace is present

/ # tail -f tmp/telegraf.log
2023-08-22T07:33:06Z D! [sarama] client/brokers registered new broker #0 at localhost:9092
2023-08-22T07:33:06Z D! [sarama] Successfully initialized new client
2023-08-22T07:33:06Z D! [agent] Successfully connected to outputs.kafka
2023-08-22T07:33:06Z D! [agent] Starting service inputs
command terminated with exit code 137

@powersj
Copy link
Contributor

powersj commented Aug 22, 2023

Take a look at your error messages. In the first case:

2023-08-21T14:12:39Z E! [telegraf] Error running agent: connecting output outputs.prometheus_client: error connecting to output "outputs.prometheus_client": listen tcp :9091: bind: address already in use

Probably because you had telegraf running already?

command terminated with exit code 137

That is usually out of memory error in a pod.

I have put up #13813 with your suggested fix. I would much rather see an actual trace than do this, but if you could give that a try.

@powersj powersj added the waiting for response waiting for response from contributor label Aug 22, 2023
@Rajat0312
Copy link
Author

Thanks @powersj , i will try to get this trace

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Aug 23, 2023
@powersj powersj added the waiting for response waiting for response from contributor label Aug 25, 2023
@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Sep 8, 2023

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Forums or provide additional details in this issue and reqeust that it be re-opened. Thank you!

@telegraf-tiger telegraf-tiger bot closed this as completed Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior waiting for response waiting for response from contributor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants