Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zombie dbus processes generated by github.com/99designs/keyring #773

Closed
mihaitodor opened this issue Apr 13, 2023 · 10 comments
Closed

Zombie dbus processes generated by github.com/99designs/keyring #773

mihaitodor opened this issue Apr 13, 2023 · 10 comments
Assignees
Labels
bug Erroneous or unexpected behaviour status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector. status-triage_done Initial triage done, will be further handled by the driver team

Comments

@mihaitodor
Copy link

I noticed that gosnowflake added https://github.com/99designs/keyring as a dependency and, unfortunately, this library has an outstanding bug which can lead some Linux systems to create zombie dbus processes. This issue was observed initially when importing the Apache Pulsar client library into Benthos as described here and it's triggered in one of the init() functions from github.com/99designs/keyring. One workaround that I identified is to set DBUS_SESSION_BUS_ADDRESS=/dev/null on process startup.

I tried to reproduce it in a Docker container, but didn't succeed yet, so I opened this issue to raise awareness and I'll share detailed reproduction steps at a later time if I get more time to poke around with it.

@mihaitodor mihaitodor added the bug Erroneous or unexpected behaviour label Apr 13, 2023
@sfc-gh-dszmolka
Copy link
Contributor

hi @mihaitodor (or anyone else stumbling into this issue) - first of all, thank you so much for raising this with us and for providing the details about the issue and even the workaround to tackle it. This is highly appreciated.

I would recommend marking this as closed until a reproduction is available. The issue can be still commented to, even in Closed state, and I can reopen it when it becomes more actionable with the repro. Thank you in advance !

@sfc-gh-dszmolka sfc-gh-dszmolka closed this as not planned Won't fix, can't repro, duplicate, stale Apr 26, 2023
@crflanigan
Copy link

This issue appears to still exist:
influxdata/telegraf#13481

@sfc-gh-dszmolka
Copy link
Contributor

hi folks - I see sometimes people stumble into this issue; so apparently it's still here. We're taking a look.

@mihaitodor
Copy link
Author

mihaitodor commented Aug 29, 2023

It hasn't been resolved, unfortunately... We're forcing a fork of github.com/99designs/keyring in Benthos: https://github.com/benthosdev/benthos/blob/bde063a467bd5e783c525365cb13ac239c0eb4e5/go.mod#L3, but that breaks go install:

> go install github.com/benthosdev/benthos/v4/cmd/benthos@latest
go: downloading github.com/benthosdev/benthos/v4 v4.20.0
go: github.com/benthosdev/benthos/v4/cmd/benthos@latest (in github.com/benthosdev/benthos/[email protected]):
	The go.mod file for the module providing named packages contains one or
	more replace directives. It must not contain directives that would cause
	it to be interpreted differently than if it were the main module.

Related issues:

It would be amazing if https://github.com/99designs/keyring addresses this issue, but I think the best bet is for both gosnowflake and pulsar-client-go to consider replacing it with a maintained fork. That will solve the issue for downstream projects.

@sfc-gh-dszmolka
Copy link
Contributor

sfc-gh-dszmolka commented Aug 29, 2023

thank you so much for the summary and sharing your approach @mihaitodor !
at least now I finally found a way how to reproduce the issue: on a RHEL 7.9 Maipo VM, using go1.21.0, simply running the example select1.go produces an instance of

/usr/bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session

Each rerun of the script produces one instance of the dbus-daemon process just exactly as mentioned on hashicorp/vault#22560. (i'm missing the --syslog argument but I think that's irrelevant for this issue)

hopefully this allows us to address it faster.

edit: i can also confirm that adding DBUS_SESSION_BUS_ADDRESS=$XDG_RUNTIME_DIR/bus to the runtime environment works; the orphaned dbus-daemon processes don't spawn, as mentioned over there in the vault Issue. Putting it here too for the folks coming across this particular issue.

@sfc-gh-dszmolka sfc-gh-dszmolka added status-in_progress Issue is worked on by the driver team and removed status-triage Issue is under initial triage labels Aug 30, 2023
@aphorise
Copy link

aphorise commented Oct 3, 2023

This issue also occurs on Vault 1.13.7 that's also making use of this Dbus package.

@crobert-1
Copy link

I filed an issue as well, not knowing this was already raised against this repository: 99designs/keyring#135

The Snowflake receiver in the OpenTelemetry Collector is importing this repo, and thus causing a leaked goroutine there. Shown in PR description here.

@sfc-gh-dszmolka sfc-gh-dszmolka added status-pr_pending_merge A PR is made and is under review and removed status-in_progress Issue is worked on by the driver team labels Feb 23, 2024
@sfc-gh-dszmolka
Copy link
Contributor

A quick summary:

  • issue stems from 99designs/keyring (hey, its in the title even :)) and possibly affects everybody who has it as a dependency
  • unfortunately, that includes gosnowflake
  • only Linux is affected, and only certain distributions. Not sure about the rest but it can be consistently reproduced on RHEL (and consistently not reproduced on Debian for example)
  • after some debugging it looks like issue stems from the init() in kwallet and secretservice components of keyring which is, again very unfortunately, initialized before gosnowflake itself is initialized. So nothing much we can do in gosnowflake to prevent the issue - except for planning to replace 99designs/keyring with another implementation
  • at the moment, those 2 components are not even used for Linux in gosnowflake, yet it's only (certain) Linux who is affected by the issue.
  • we discussed multiple options to approach the issue (forking, vendoring, etc.) but in the end we decided to call the issue out, document it, and advise to use the well-known working workaround of setting DBUS_SESSION_BUS_ADDRESS in the runtime; where necessary.

So hopefully it will be fixed eventually in the keyring dependency and/or we migrate our relevant code part to something else, but until then, please keep using the workaround.
There's a PR #1058 for changing the readme and also providing some autodetection mechanism which warns if the OS is possibly affected.

@sfc-gh-dszmolka sfc-gh-dszmolka added status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector. and removed status-pr_pending_merge A PR is made and is under review labels Mar 4, 2024
@sfc-gh-dszmolka
Copy link
Contributor

aforementioned change has been merged, and will be part of the next driver release cycle towards end of March.

@sfc-gh-dszmolka sfc-gh-dszmolka added the status-triage_done Initial triage done, will be further handled by the driver team label Mar 12, 2024
@sfc-gh-dszmolka
Copy link
Contributor

fix released with 2024 March release, version 1.9.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Erroneous or unexpected behaviour status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector. status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

6 participants