Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation around 4 -> 5 upgrade; ArgumentError: Start sender first #274

Open
wjessop opened this issue May 2, 2023 · 10 comments
Open

Comments

@wjessop
Copy link

wjessop commented May 2, 2023

I'm unsure what I need to do to make the 4 to 5 upgrade work. I see the README file has some detail:

  • Using a singleton instance of the DogStatsD client instead of creating a new instance whenever you need one; this will let the buffering mechanism flush metrics regularly
  • Or properly disposing of the DogStatsD client instance when it is not needed anymore using the method Datadog::Statsd#close

but I thought we were using a singleton instance, and I don't think we want to be sprinkling #close around the code. The current way we're creating the instance is in an initialiser:

require 'datadog/statsd'

Rails.configuration.statsd = Datadog::Statsd.new(
  tags: {
    env: Rails.env
  },
  disable_telemetry: Rails.env != 'production'
)

then we use it like so:

Rails.configuration.statsd.distribution('elasticsearch.client.response_time', duration, tags: tags)

Deployed on Heroku using Puma, we're getting an error #<ArgumentError: Start sender first> that is crashing the Rails (6.1.7.3) process on boot. Is there something else we should be doing?

@wjessop
Copy link
Author

wjessop commented May 23, 2023

Is there anyone who could take a look at this?

@scottopell
Copy link

Hi @wjessop

Sorry for the delay, I'm happy to help you figure this out.

What you're seeing appears to be some kind of forking-related error, could you provide some more details on how your Puma server is configured? Particularly anything related to "Workers" as this appears to be how Puma creates new processes.

Here's my repro attempt if its helpful to compare (this is just rails new plus a scaffold and a bit of datadog configuration)
https://github.com/scottopell/demo-app-rails6

To give you a bit more insight, when the statsd client is forked (I'm guessing this is the case as the rails initializer probably runs once at startup, and then puma forks the process), we should be starting a new sender thread in the new process, but it appears that is not happening in your case, which is leading to this error.

@wjessop
Copy link
Author

wjessop commented May 25, 2023

Here's our current datadog initialiser:

# frozen_string_literal: true

require 'datadog/statsd'

# set up statsd monitoring
unless Rails.configuration.respond_to?(:statsd)
  statsd = Datadog::Statsd.new(
    tags: {
      env: Rails.env
    },
    disable_telemetry: Rails.env != 'production'
  )

  Rails.configuration.statsd = statsd
end

The Puma config:

# frozen_string_literal: true

require 'puma_worker_killer'
require 'barnes'

workers Integer(ENV.fetch('WEB_CONCURRENCY', 1))
threads_count = Integer(ENV.fetch('PUMA_MAX_THREADS', 1))
threads(threads_count, threads_count)

preload_app!

rackup DefaultRackup
environment ENV.fetch('RACK_ENV', 'development')
worker_timeout 15
worker_shutdown_timeout 8

on_worker_boot do
  # Worker specific setup for Rails 4.1+
  # See: https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server#on-worker-boot
  Makara::Context.release_all
  ActiveRecord::Base.establish_connection

  SemanticLogger.reopen
end

# Puma worker killer. Bandaid for memory issues.
before_fork do
  PumaWorkerKiller.enable_rolling_restart(3600) # Overriding 6 hour default to do it every hour to kill off memory-hungry workers that hang around.
  Barnes.start # see https://devcenter.heroku.com/articles/language-runtime-metrics-ruby
end

Env for the staging environment where we were trying this out:

WEB_CONCURRENCY: 2
PUMA_MAX_THREADS: 5

I figure the same as you, I'm probably missing some config but don't know what that would be.

@scottopell
Copy link

Thanks for the extra details, I'm unfortunately still not able to reproduce this.

I have a few questions that can hopefully help us get to the bottom of this:

  1. Can you confirm that changing only the version of dogstatsd-ruby causes the issue you're seeing? From which version to which other version are you changing?
  2. Could you also provide your ruby version and any relevant gem versions (puma, barnes, puma_worker_killer, anything that may have anything to do with threads/forking really)
  3. Do both workers emit this error? Or does one continue running okay?
  4. Does this error happen immediately on application startup? Or after some amount of time (maybe the same interval as puma_worker_killer?)
  5. Do you have any logs you can share with a stack trace and any relevant events that happen before this error?

Without a reproduction I'm stuck reading through the code and I'm unfortunately not having much luck figuring out how you'd get into this state.

As a workaround, you could try setting single_thread: true in the statsd constructor, which may help? If it does help that may help us narrow down what's going on here.

@scottopell
Copy link

ping @wjessop in case you missed the above

@wjessop
Copy link
Author

wjessop commented Jun 12, 2023

Sorry @scottopell, was on vacation! Will try to take a look at this this week.

@scottopell
Copy link

@wjessop one more ping if you still need help

@wjessop
Copy link
Author

wjessop commented Jul 5, 2023

Sorry Scott, lots going on!

Can you confirm that changing only the version of dogstatsd-ruby causes the issue you're seeing? From which version to which other version are you changing?

Yes, it's only dogstatsd-ruby changing.

Could you also provide your ruby version and any relevant gem versions (puma, barnes, puma_worker_killer, anything that may have anything to do with threads/forking really)

Ruby 2.7.5
barnes (0.0.9)
puma (5.6.4)
puma_worker_killer (0.3.1)
https://github.com/wjessop/oas_agent (master, starts a background reporting thread but doesn't interact with datadog in it)

Do both workers emit this error? Or does one continue running okay?

I don't think anything remains up, but I've only seen one instance of the error in the logs.

Do you have any logs you can share with a stack trace and any relevant events that happen before this error?

There's no stack trace, just the log, I don't have the log output to hand, but I recall it wasn't particularly useful.

As a workaround, you could try setting single_thread: true in the statsd constructor, which may help?

We could try this but I think we'd still want to solve the issue with the error.

I think what would help would be if you could provide a canonical example of what a datadog config looks like when using Puma in pre-forking mode (probably the most common Rails app server and configuration). Saying "Using a singleton instance of the DogStatsD client instead of creating a new instance whenever you need one; this will let the buffering mechanism flush metrics regularly" in the docs is somewhat abstract, a concrete example would be really helpful to compare to what we're currently doing.

@wjessop
Copy link
Author

wjessop commented Sep 20, 2023

Hey Scott, if you've got it a canonical example of what a datadog config looks like when using Puma in pre-forking mode would be really helpful here.

@wjessop
Copy link
Author

wjessop commented Nov 6, 2023

@scottopell Any chance you could give an example?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants