Updating the Techempower benchmarks code. #318

timuckun · 2023-04-11T12:52:35Z

timuckun
Apr 11, 2023

Hey all. I thought I would give Roda a try because of the way it performs on the benchmarks suite. I looked at the code and saw that it hadn't been updated in a few years. I updated ruby to 3.2, updated the gems to the latest version, swapped out the json gem for OJ and enabled YJIT. According the tests I ran on my laptop the single query benchmark improved by roughly 30% (mysql).

I had to make one tiny change to the code to get rid of a warning and added one line to boot.rb to enable json compatibility with OJ.

Looking at the main app it doesn't seem idiomatic with the roda docs on the web. It looks more like a typical rack app. I haven't used roda in anger before so I thought somebody who is an expert can take a look and see if it need to be updated.

There may be other ways to squeeze more performance out of the code too. The specs talk about the timestamp header not having to be updated every millisecond for example.

I would really appreciate it if somebody from this community took a look before I send a pull request. The code is here

https://github.com/timuckun/FrameworkBenchmarks/tree/roda/frameworks/Ruby/roda-sequel

Thanks.

jeremyevans · 2023-04-11T15:19:11Z

jeremyevans
Apr 11, 2023
Maintainer

The app currently uses the static_routing plugin. Since there are only 6 routes, switching it to a normal Roda app would likely improve performance (most for the routes at the beginning, least for the routes at the end). You could avoid using the hooks plugin by setting the Date and Server up front. Maybe try a switch to:

route do |r|
  response["Date"] = Time.now.httpdate
  response["Server"] = SERVER_STRING if SERVER_STRING

  r.is 'db' do
    # ...
  end

  r.is 'queries' do
    # ...
  end

  # so on for plaintext, json, fortunes, updates
end

This uses r.is instead of r.get as all requests are GET requests (I think). If they require handling non-GET requests differently, then you could add a next unless env['REQUEST_METHOD'] == 'GET' to the top.

bounded_queries could use clamp instead of multiple conditionals.

Remove the layout_opts option from the render plugin, it shouldn't be necessary.

The Time.now.httpdate call would be nice to cache. Maybe it is possible to use Process.clock_gettime(:TIME_BASED_CLOCK_REALTIME) and only update the value when it changes? Alternatively, Maybe a separate thread with a loop that updates and then sleeps until the next second? These are both guesses, and I'm not sure how much performance is affected by it.

Obviously, you would want to benchmark and see if this actually speeds things up. It would also be useful to profile to see if anything sticks out as being a performance issue.

0 replies

timuckun · 2023-04-12T11:15:04Z

timuckun
Apr 12, 2023
Author

I'll give that a go and see how it works out. As for the headers I think that unless you can somehow memoize the headers and prevent processing the headers hash you are not likely to get any kind of a performance gain. I was looking through the source code of rack and it requires an unfrozen hash as the headers so that's out of the question.

BTW I saw this in the source code.

# Call the Rack application generated by this builder instance. Note that
    # this rebuilds the Rack application and runs the warmup code (if any)
    # every time it is called, so it should not be used if performance is important.
    def call(env)
      to_app.call(env)
    end

This is weird because every rack tutorial I have seen builds a class with a call method.

1 reply

jeremyevans Apr 12, 2023
Maintainer

I'll give that a go and see how it works out. As for the headers I think that unless you can somehow memoize the headers and prevent processing the headers hash you are not likely to get any kind of a performance gain. I was looking through the source code of rack and it requires an unfrozen hash as the headers so that's out of the question.

The headers hash must be unfrozen, but header keys and values can be frozen. It's probably best not to worry about the Time.now.httpdate call unless profiling shows it is taking substantial time.

BTW I saw this in the source code.

# Call the Rack application generated by this builder instance. Note that
    # this rebuilds the Rack application and runs the warmup code (if any)
    # every time it is called, so it should not be used if performance is important.
    def call(env)
      to_app.call(env)
    end

This is weird because every rack tutorial I have seen builds a class with a call method.

That's in Rack::Builder and not in Roda. Code generally should not be calling Rack::Builder#call, they should be calling the app generated by the Builder.

Performance-wise, that part of the benchmarks is fine. In config.ru, it uses run HelloWorld.freeze.app, so it is freezing the Roda class and then running the generated app.

timuckun · 2023-04-20T10:03:33Z

timuckun
Apr 20, 2023
Author

I submitted the pull request. Let's see what they do with it.

0 replies

timuckun · 2023-05-10T09:30:12Z

timuckun
May 10, 2023
Author

I am now working on the rack benchmarks and am using sequel to do it. Like the Roda benchmarks I plan on running unicorn, puma, and maybe passenger but I also want to add falcon and jruby to the list. Is there anything special I should do to optimize sequel for the select multi and update multi tests? These tests require you to make N distinct select statements and collate the results and present them as JSON. Both of these will be run with concurrency of 512.

How do I make sure sequel is able to cope with making up to 20 selects per request with a concurrency of 512? I see that there are a couple of plugins like async_thread_pool and fiber_concurrency does it make sense to select the records async? How do I deal with puma when it's forking and threading at the same time is there some sync call I need to make before i run queries? Do these libs matter if you are using jdbc?

In the updates test the setup is similar in that you make up to 20 selects but then modify the data and update it back to the database. They say "Using bulk updates—batches of update statements or individual update statements that affect multiple rows—is acceptable but not required. To be clear: bulk reads are not permissible for selecting/reading the rows, but bulk updates are acceptable for writing the updates."

so I presume I can construct a postgres statement with values or something and then update them all in one shot but I suppose it's also possible to just create a bunch of update statements and execute them in the same call. I don't think it would make that much of a difference for 20 or so records.

Of course I don't if this is cheating or not but using postgres I could do the select and update in one statement using a returning. Seems like that might be cheating though.

Any ideas on dealing with high concurrency benchmarks would be much appreciated.

1 reply

jeremyevans May 17, 2023
Maintainer

I assume you would be running this with a bunch of child processes on CRuby, so you would probably want to use the max_connections Database option, and spawn enough child processes with with threads such that (workers*threads>=512) concurrency.

You probably would only want to use fiber_concurrency if you are using fibers (e.g. falcon, not unicorn or puma).

JDBC on JRuby should work fine with the async_thread_pool extension. By default, queries are synchronous when using JRuby/JDBC just as they are on other Sequel adapters.

I haven't used the async_thread_pool extension with 512 concurrent threads, but hopefully it would work. See later note about loading the extension after fork.

You could also try using the :num_async_threads Database option to limit the number of async threads created. Ideally, you always want to have more available connections than async threads, since you don't want your async threads blocking.

You could try using the pool_class: :timed_queue Database option if you are using Ruby 3.2, and see if it is faster than the default connection pool.

I'm not sure the update/select using returning would be acceptable, since you are updating before the select and not after. But I didn't read the benchmark rules.

timuckun · 2023-05-11T09:45:36Z

timuckun
May 11, 2023
Author

Hey Jeremy. I created a simple benchmark and the results are eye opening.

Here is the code

this is for fifty queries

Benchmark.ips do |x|
	x.report("sync") do |i|
             results = []
              iterations.times do
                   results << db.connection["SELECT id, randomNumber FROM World WHERE id = ?", db.random_id].first
	     end
	end

        x.report("async") do |i|
            results = []
            promises =[]
	    iterations.times do
                 promises << db.connection["SELECT id, randomNumber FROM World WHERE id = ?", db.random_id].async
	    end
             promises.each do | p|
                  results << p.first
              end
        end
	x.compare!
end

Here is the results

Warming up --------------------------------------
                sync     8.930B i/100ms
               async   250.558B i/100ms
Calculating -------------------------------------
                sync    219.544B (±24.9%) i/s -      1.045T in   5.014346s
               async    504.586T (±22.4%) i/s -      1.645Q in   4.993101s

Comparison:
               async: 504585910661918.6 i/s
                sync: 219543795652.7 i/s - 2298.34x  slower

0 replies

timuckun · 2023-05-11T10:05:34Z

timuckun
May 11, 2023
Author

The problem is that I don't know how to do this with a prepared statement.

   @world_select=@connection["SELECT id, randomNumber FROM World WHERE id = ?", :$id].prepare(:select, :select_by_id) 

   @world_select.async.call(id: random_id)

does not return a binding, it returns an actual array.

[{:id=>8459, :randomnumber=>1005}]

0 replies

timuckun · 2023-05-11T11:27:51Z

timuckun
May 11, 2023
Author

Actually it turns out that the original benchmark isn't actually resolving the promises even though it shows as being resolved in IRB. When I put the results in a controller and try to dump them to OJ I don't get the array of hashes but a bunch of promises.

I changed the code to this in order to make it work

x.report("async") do |i|

    promises =[]
    results = []

		iterations.times do
      promises << db.connection["SELECT id, randomNumber FROM World WHERE id = ?", db.random_id].async
	  end
    promises.each do | p|
      results << p.first.to_hash
    end
  end

and the results were very surprising.

Warming up --------------------------------------
                sync     6.893B i/100ms
               async     3.350B i/100ms
Calculating -------------------------------------
                sync    588.138B (±11.6%) i/s -      2.902T in   5.002290s
               async    189.855B (±18.5%) i/s -    914.682B in   5.010385s

Comparison:
                sync: 588137622121.1 i/s
               async: 189854749434.4 i/s - 3.10x  slower

0 replies

jeremyevans · 2023-05-11T12:56:17Z

jeremyevans
May 11, 2023
Maintainer

I think the last example is still sequential, but with the async overhead. Can you try:

x.report("async") do |i|

    promises =[]
    results = []
    iterations.times do
      promises << db.connection["SELECT id, randomNumber FROM World WHERE id = ?", db.random_id].async.first
    end
    promises.each do | p|
      results << p.to_hash
    end
  end

If that doesn't work better, I don't have any ideas right now. But I'm also currently at a conference and haven't been getting much sleep :) . If that doesn't fix it, please let me know and I'll look into the issue when I get back.

I'm guessing the prepared statements issue is because internally the prepared statement code is executing the query async but also resolving the promise before returning the value. That is probably fixable, I can look into the issue when I return.

I apologize that I do not currently have time to read the note on the rack implementation. However, I will try to read and respond to that as well when I return.

2 replies

timuckun May 12, 2023
Author

I tried that and it's faster by about 30% which is not insignificant.

Warming up --------------------------------------
                sync    10.465B i/100ms
               async    12.038B i/100ms
Calculating -------------------------------------
                sync    923.924B (± 8.4%) i/s -      4.573T in   5.000068s
               async      1.213T (±10.1%) i/s -      5.995T in   4.999178s

Comparison:
               async: 1213346311667.6 i/s
                sync: 923924446760.4 i/s - 1.31x  slower

jeremyevans Jun 2, 2023
Maintainer

I looked into prepared statements with async_thread_pool on PostgreSQL, and it appears to work correctly:

ds = DB.select{pg_sleep(1)}.async
ps = ds.prepare(:select, :pgs)
t = Time.now
p((2.times.map{ps.call} + 2.times.map{ds.call(:select)}).map{_1.itself})
p(Time.now - t)

This runs in just over 1 second, and if you comment out the .async, just over 4 seconds.

timuckun · 2023-05-12T12:44:08Z

timuckun
May 12, 2023
Author

FYI:

Running the benchmark with async queries (update or select) fails with the following error.

--------------------------------------------------------------------------------
VERIFYING UPDATE
--------------------------------------------------------------------------------
Accessing URL http://tfb-server:8080/updates?queries=2: 
Verifying test update for rack caused an exception: HTTPConnectionPool(host='tfb-server', port=8080): Read timed out. (read timeout=15)
   FAIL for http://tfb-server:8080
     Connection to server timed out
     See https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Project-Information-Framework-Tests-Overview#specific-test-requirements
--------------------------------------------------------------------------------

So for some reason the request times out. If I call the sync code then it works. Here are the two implementations (same code as the benchmark)

def get_promises(queries)
    promises = []
    queries.times do
      promises << @connection["SELECT id, randomNumber FROM World WHERE id = ?", random_id].async.first
    end
    return promises
  end

  def get_multiple_records(queries)
    queries = validate_query_range(queries)
    results = []
    queries.times do
      results << @world_select.call(id: random_id)[0]
    end
    results
  end
  def get_multiple_records_async(queries)
    queries = validate_query_range(queries)
    promises = get_promises(queries)
    results=[]

    promises.each do | p|
      results << p.to_hash
    end
    return results
  end

8 replies

timuckun May 20, 2023
Author

The code is in the file called pg_db. The methods in question are

def update_worlds(count, async = false)
    results = if async
      select_worlds_async(count)
    else
      select_worlds(count)
    end
    #values = []
    ids=[]
    sql = String.new("UPDATE world SET randomnumber = CASE id ")
    results.each do |r|
      r[:randomnumber] = random_id
      ids << r[:id]
      sql << "when #{r[:id]} then #{r[:randomnumber]} "
    end
    sql << "ELSE randomnumber END WHERE id IN ( #{ids.join(',')})"
    @connection[sql].update
    results
  end

def select_worlds_async(count)
    promises = select_promises(count)
    results = []
    promises.each do |p|
      results << p.to_hash
    end
    results
  end

 def select_promises(count)
    count = validate_count(count)
    promises = []
    count.times do
      @connection.synchronize do
        promises << @connection['SELECT id, randomNumber FROM World WHERE id = ?', random_id].async.first
      end
    end
    promises
  end

The table in question only has two fields id and randomnumber and it has 10K records.

Here is a docker compose that can launch you a dev container and the database.

 x-backend:
  # Set these vars and args according to your app
  # this is so you can run multiple versions one to act as the server, one to dev in etc

  &backend
  build:
    #change the context and docker file if you want to use one of the benchmark containers.
    #for example
    #  context: ..
    #  dockerfile: rack.dockerfile
    context: .
    dockerfile: Dockerfile
    args:
      RUBY_VERSION: '3.2'
      DISTRO_NAME: 'bullseye'
      PG_MAJOR: '15'
  environment:

    DATABASE_URL: "postgresql://tfb-database/hello_world?user=benchmarkdbuser&password=benchmarkdbpass"
    XDG_DATA_HOME: /app/tmp/cache
    HISTFILE: /usr/local/hist/.bash_history
    IRB_HISTFILE: /usr/local/hist/.irb_history
    EDITOR: vi

  stdin_open: true
  tty: true

  tmpfs:
    - /tmp
    - /app/tmp/pids

  volumes:
    - ../.:/app:cached
    - bundle:/usr/local/bundle
    - history:/usr/local/hist
  # - ./.psqlrc:/root/.psqlrc:ro
  # - ./.bashrc:/root/.bashrc:ro
  # depends_on:
  #   &backend_depends_on
  #   postgres:
  #     condition: service_healthy


services:
  tfb-database:
    build:
      context: ../../../../../toolset/databases/postgres/
      dockerfile: postgres.dockerfile
    ports:
      - 5432:5432

  dev:
    <<: *backend
    # Overrides default command so things don't shut down after the process ends.
    command: sleep infinity

  pgadmin:
    image: dpage/pgadmin4:latest
    volumes:
      - ./pgadmin_servers.json:/pgadmin4/servers.json
      - ./pgpass:/pgadmin4/.pgpass:ro
      #- pgadmin:/var/lib/pgadmin
    ports:
      - 5050:5050
    environment:
      PGADMIN_DEFAULT_EMAIL: [email protected]
      PGADMIN_DEFAULT_PASSWORD: admin
      PGADMIN_LISTEN_PORT: 5050
      PGADMIN_SERVER_JSON_FILE: /pgadmin4/servers.json

volumes:
  bundle:
  history:

Here is a dockerfile for the dev environment

# syntax=docker/dockerfile:1

ARG RUBY_VERSION
ARG DISTRO_NAME

FROM ruby:$RUBY_VERSION-slim-$DISTRO_NAME

# Common dependencies
# Using --mount to speed up build with caching, see https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/reference.md#run---mount
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
  --mount=type=cache,target=/var/lib/apt,sharing=locked \
  --mount=type=tmpfs,target=/var/log \
  rm -f /etc/apt/apt.conf.d/docker-clean; \
  echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache; \
  apt-get update -qq && \
  DEBIAN_FRONTEND=noninteractive apt-get -yq dist-upgrade && \
  DEBIAN_FRONTEND=noninteractive apt-get install -yq --no-install-recommends \
  build-essential \
  gnupg2 \
  curl \
  less \
  git

#RUN echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list \
#    && cat /etc/apt/sources.list.d/pgdg.list \
#    && curl --silent https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -

ARG PG_MAJOR
ARG DISTRO_NAME
RUN curl -sSL https://www.postgresql.org/media/keys/ACCC4CF8.asc | gpg --dearmor -o /usr/share/keyrings/postgres-archive-keyring.gpg \
  && echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/postgres-archive-keyring.gpg] https://apt.postgresql.org/pub/repos/apt/" \
  $DISTRO_NAME-pgdg main $PG_MAJOR | tee /etc/apt/sources.list.d/postgres.list > /dev/null
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
  --mount=type=cache,target=/var/lib/apt,sharing=locked \
  --mount=type=tmpfs,target=/var/log \
  apt-get update -qq && DEBIAN_FRONTEND=noninteractive apt-get -yq dist-upgrade && \
  DEBIAN_FRONTEND=noninteractive apt-get install -yq --no-install-recommends \
  libpq-dev \
  postgresql-client-$PG_MAJOR



# Configure bundler
ENV LANG=C.UTF-8 \
  BUNDLE_JOBS=4 \
  BUNDLE_RETRY=3

# Store Bundler settings in the project's root
ENV BUNDLE_APP_CONFIG=.bundle

# Uncomment this line if you want to run binstubs without prefixing with `bin/` or `bundle exec`
# ENV PATH /app/bin:$PATH

# Upgrade RubyGems and install the latest Bundler version and the specified
RUN gem update --system && \
  gem install bundler

# Document that we're going to expose port 3000
#EXPOSE 3000
#RUN bundle install  --jobs=4 --gemfile=/rack/Gemfile --path=/rack/rack/bundle
# Use Bash as the default command
CMD ["/usr/bin/bash"]

Here is a devcontainer.json file if you are using visual studio code. You can put the dockerfile, compose file and this json in the .devcontainer folder.

// For format details, see https://aka.ms/devcontainer.json. For config options, see the README at:
// https://github.com/microsoft/vscode-dev-containers/tree/v0.245.2/containers/ruby
{
  "name": "Ruby 3.2 devcontainer",
  "dockerComposeFile": [
    "docker-compose.yml"
  ],
  "service": "dev",
  "runServices": [
    "dev",
    "tfb-database",
    "pgadmin"
  ],
  "workspaceFolder": "/app",
  "customizations": {
    "vscode": {
      "extensions": [
        "shopify.ruby-extensions-pack",
        "vscode-icons-team.vscode-icons",
        "bungcip.better-toml",
        "humao.rest-client"
      ],
      "settings": {
        "files.autoSave": "onFocusChange",
        "workbench.iconTheme": "vscode-icons",
        "editor.formatOnSave": true,
        "[ruby]": {
          "editor.defaultFormatter": "Shopify.ruby-lsp",
          "editor.formatOnType": true,
          "editor.tabSize": 2,
          "editor.insertSpaces": true,
          "editor.semanticHighlighting.enabled": true
        }
      }
    }
  },
  "postStartCommand": "bundle install"
}

jeremyevans Jun 2, 2023
Maintainer

Sorry for the delay in getting back to you. I do all of my development on OpenBSD, which does not support docker. If you can provide a minimal self contained reproducible example as a single ruby file, I may have time to look into the issue. If not, I'm afraid I won't be able to help.

timuckun Jun 5, 2023
Author

This is the single file you need.

https://github.com/timuckun/FrameworkBenchmarks/blob/fix-broken-ruby-rack/frameworks/Ruby/rack/pg_db.rb

it's a simple wrapper around a Sequel connection and runs all the queries. The methods that are of interest are

select_worlds, select_worlds_async and update_worlds. The update worlds takes a parameter called async and calls the appropriate method according to that flag.

The table has two fields, an id field and a randomNumber field both are integers.

The class takes a connection string so you can set up a table with some records and test it. In the benchmarks the count for the select worlds and update worlds is passed in via a URL parameter and varies between 0 and 512.

Hope this helps.

BTW I have sent the pull request in, it's not using async right now.

Also BTW falcon throws up all kinds of warnings when running this code with fibers but it does run so hopefully that's not impacting the performance too much.

Let me know if I can be of more help.

jeremyevans Jun 5, 2023
Maintainer

Thanks for submitting the pull request to Tech Empower.

Also thank you for providing a link to the single file example. It was still not self contained, but at least something it was something I could work with. I tried to turn it into a self contained example (designed to be run with the bin/sequel CLI tool):

# frozen_string_literal: true

require 'benchmark/ips' 

class PgDb
  QUERY_RANGE = 1..10_000 # range of IDs in the Fortune DB
  ALL_IDS = QUERY_RANGE.to_a # enumeration of all the IDs in fortune DB
  MIN_QUERIES = 1 # min number of records that can be retrieved
  MAX_QUERIES = 500 # max number of records that can be retrieved
  NUM_WORLDS = 512
  NUM_FORTUNES = QUERY_RANGE.max

  def self.random_id
    Random.rand(QUERY_RANGE)
  end

  Sequel.extension :fiber_concurrency if defined?(Falcon)

  if ENV['ASYNC']
    ASYNC = true
    DB.extension :async_thread_pool
  else
    ASYNC = false
  end

  unless DB.table_exists?(:world)
    DB.create_table(:world) do
      Integer :id, primary_key: true
      Integer :randomnumber
    end
    i = 0
    DB[:world].import([:id, :randomnumber], QUERY_RANGE.to_a.sample(NUM_WORLDS).map{|rv| [i+=1, rv]})
  end

  unless DB.table_exists?(:fortune)
    DB.create_table(:fortune) do
      Integer :id, primary_key: true
      String :message
    end
    i = 0
    DB[:fortune].import([:id, :message], Array.new(NUM_FORTUNES).map{[i+=1, Random.bytes(25).unpack1('h*')]})
  end

  world_select = DB[:world].select(:id, :randomnumber).where(id: :$id)
  world_select = world_select.async if ASYNC
  WORLD_SELECT_DS = world_select
  WORLD_SELECT = world_select.prepare(:select, :select_world_by_id)
  WORLD_SELECT_ONE = world_select.prepare(:first, :first_world_by_id)

  WORLD_RANDOM_SELECT_ONE = DB[:world].
    select(:id, :randomnumber).
    where(id: :$id, randomnumber: :$randomvalue).
    prepare(:first, :first_world_by_id_and_random)

  FORTUNE_SELECT = DB[:fortune].select(:id, :message).prepare(:select, :select_all)

  def select_random_world
    WORLD_SELECT_ONE.call(id: random_world_id)
  end

  def select_world(id)
    WORLD_SELECT_ONE.call(id: id)
  end

  def select_worlds(count)
    results = Array.new(count.to_i.clamp(MIN_QUERIES, MAX_QUERIES)){select_random_world}
    results.map!(&:__value) if ASYNC
    results
  end

  def update_worlds(count)
    results = select_worlds(count)
    h = {}
    ids = results.map do |r|
      id = r[:id]
      h[id] = random_id
      id
    end
    DB[:world].where(:id=>ids).update(randomnumber: Sequel.case(h, :id, :randomnumber))
    results
  end

  def select_fortunes
    FORTUNE_SELECT.call
  end

  def random_world_id
    Random.rand(NUM_WORLDS)+1
  end

  def random_id
    Random.rand(QUERY_RANGE)
  end
end

db = PgDb.new
world_ids = (1..512).to_a
p(Hash === db.select_random_world ? :sync : :async)

Benchmark.ips do |x|
  x.report('select_random_world') do
    Array.new(1000){db.select_random_world}.map(&:to_hash)
  end
  x.report('select_world(id)') do
    world_ids.map{|id| db.select_world(id)}.map(&:to_hash)
  end
end

With a local database, I was never able to get the async mode to be faster than the sync mode on CRuby. There's just too much overhead in the async mode, and with simple queries such as the ones in the example, not enough latency with a local database to make it worth using. I was never able to get any failures, even with using 512 max_connections and 500 async threads. The best async performance seemed to be around 14 async_threads in my environment on CRuby.

On JRuby, async mode could be faster, about twice as fast select_world(id), with best results around 64 async threads.

My testing was done on a 4-core AMD Ryzen 5 PRO 2400GE on OpenBSD.

If you want to play with the above example and can submit a modified example that shows fails in your environment, I'd be happy to take a look.

timuckun Jun 7, 2023
Author

Thanks for looking into this. I guess the problem must be with puma hybrid forking/threading model combined with rack combined with sequel or something. In any case since it doesn't make too much of a difference in this scenario I'll just leave the sync queries in place.

Thanks for your efforts, also thanks for writing sequel and roda!

Cheers.

timuckun · 2023-05-13T09:53:27Z

timuckun
May 13, 2023
Author

This seems to be a problem with the threading of the server. It works fine with unicorn but fails with puma. I haven't tried it with falcon yet and will try it and report back here.

I tried running the async queries inside of a DB.syncronize block but that didn't make any difference. Puma does have this code

before_fork do
   Sequel::DATABASES.each(&:disconnect)
 end

I should also note that puma emits this message

7] Puma starting in cluster mode...
rack: [7] * Puma version: 6.2.1 (ruby 3.2.2-p53) ("Speaking of Now")
rack: [7] *  Min threads: 18
rack: [7] *  Max threads: 18
rack: [7] *  Environment: production
rack: [7] *   Master PID: 7
rack: [7] *      Workers: 15
rack: [7] * Preloading application
rack: [7] * Listening on http://0.0.0.0:8080
rack: [7] ! WARNING: Detected 15 Thread(s) started in app boot:
rack: [7] ! #<Thread:0x00007faca1d57578 /usr/local/bundle/gems/sequel-5.68.0/lib/sequel/extensions/async_thread_pool.rb:209 run> - 
rack: [7] ! #<Thread:0x00007faca1d57320 /usr/local/bundle/gems/sequel-5.68.0/lib/sequel/extensions/async_thread_pool.rb:209 run> - 
rack: [7] ! #<Thread:0x00007faca1d56e70 /usr/local/bundle/gems/sequel-5.68.0/lib/sequel/extensions/async_thread_pool.rb:209 run> - 
rack: [7] ! #<Thread:0x00007faca1d56a88 /usr/local/bundle/gems/sequel-5.68.0/lib/sequel/extensions/async_thread_pool.rb:209 run> - 
rack: [7] ! #<Thread:0x00007faca1d567e0 /usr/local/bundle/gems/sequel-5.68.0/lib/sequel/extensions/async_thread_pool.rb:209 sleep_forever> - <internal:thread_sync>:18:in `pop'
rack: [7] ! #<Thread:0x00007faca1d55f70 /usr/local/bundle/gems/sequel-5.68.0/lib/sequel/extensions/async_thread_pool.rb:209 run> - 
rack: [7] ! #<Thread:0x00007faca1d55e08 /usr/local/bundle/gems/sequel-5.68.0/lib/sequel/extensions/async_thread_pool.rb:209 run> - 
rack: [7] ! #<Thread:0x00007faca1d55cf0 /usr/local/bundle/gems/sequel-5.68.0/lib/sequel/extensions/async_thread_pool.rb:209 run> - 
rack: [7] ! #<Thread:0x00007faca1d55bd8 /usr/local/bundle/gems/sequel-5.68.0/lib/sequel/extensions/async_thread_pool.rb:209 run> - 
rack: [7] ! #<Thread:0x00007faca1d559a8 /usr/local/bundle/gems/sequel-5.68.0/lib/sequel/extensions/async_thread_pool.rb:209 run> - 
rack: [7] ! #<Thread:0x00007faca1d557f0 /usr/local/bundle/gems/sequel-5.68.0/lib/sequel/extensions/async_thread_pool.rb:209 run> - 
rack: [7] ! #<Thread:0x00007faca1d556b0 /usr/local/bundle/gems/sequel-5.68.0/lib/sequel/extensions/async_thread_pool.rb:209 run> - 
rack: [7] ! #<Thread:0x00007faca1d55570 /usr/local/bundle/gems/sequel-5.68.0/lib/sequel/extensions/async_thread_pool.rb:209 run> - 
rack: [7] ! #<Thread:0x00007faca1d553e0 /usr/local/bundle/gems/sequel-5.68.0/lib/sequel/extensions/async_thread_pool.rb:209 run> - 
rack: [7] ! #<Thread:0x00007faca1d54b20 /usr/local/bundle/gems/sequel-5.68.0/lib/sequel/extensions/async_thread_pool.rb:209 run> - 
rack: [7] Use Ctrl-C to stop
rack: [7] - Worker 0 (PID: 23) booted in 0.12s, phase: 0
rack: [7] - Worker 3 (PID: 26) booted in 0.13s, phase: 0
rack: [7] - Worker 2 (PID: 25) booted in 0.13s, phase: 0
rack: [7] - Worker 1 (PID: 24) booted in 0.16s, phase: 0
rack: [7] - Worker 6 (PID: 34) booted in 0.13s, phase: 0
rack: [7] - Worker 4 (PID: 28) booted in 0.15s, phase: 0
rack: [7] - Worker 5 (PID: 30) booted in 0.14s, phase: 0
rack: [7] - Worker 7 (PID: 38) booted in 0.13s, phase: 0
rack: [7] - Worker 8 (PID: 45) booted in 0.13s, phase: 0
rack: [7] - Worker 9 (PID: 47) booted in 0.13s, phase: 0
rack: [7] - Worker 10 (PID: 71) booted in 0.11s, phase: 0
rack: [7] - Worker 11 (PID: 95) booted in 0.1s, phase: 0
rack: [7] - Worker 12 (PID: 106) booted in 0.1s, phase: 0
rack: [7] - Worker 14 (PID: 161) booted in 0.08s, phase: 0
rack: [7] - Worker 13 (PID: 149) booted in 0.09s, phase: 0
rack: Verifying framework URLs

These are the max_connections setup.

One of the connections is an error

rack: [7] ! #<Thread:0x00007faca1d567e0 /usr/local/bundle/gems/sequel-5.68.0/lib/sequel/extensions/async_thread_pool.rb:209 sleep_forever> - internal:thread_sync:18:in `pop'

Maybe that's catching?

4 replies

jeremyevans May 17, 2023
Maintainer

I would guess that there are problems loading the async_thread_pool extension before forking, because the async threads no longer exist in the child processes. You would probably want to load the extension in an after_fork hook. Maybe that should be documented in the async_thread_pool extension?

timuckun May 18, 2023
Author

I am loading the async extension in a class initializer, the connection is an instance variable.

@connection = Sequel.connect(connection_string, max_connections: max_connections, sql_log_level: :warning)
@connection.extension :async_thread_pool

Puma does have the preload_app! directive so maybe that's the problem.

jeremyevans May 18, 2023
Maintainer

I'm guessing you would have to put the @connection.extension :async_thread_pool part in a puma after_fork hook for it to work correctly.

timuckun May 20, 2023
Author

I don't think that's been defined by the time puma runs the config. I am not even sure if I can create the connection before puma runs the config file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating the Techempower benchmarks code. #318

{{title}}

Replies: 10 comments 16 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Updating the Techempower benchmarks code. #318

timuckun Apr 11, 2023

Replies: 10 comments · 16 replies

jeremyevans Apr 11, 2023 Maintainer

timuckun Apr 12, 2023 Author

jeremyevans Apr 12, 2023 Maintainer

timuckun Apr 20, 2023 Author

timuckun May 10, 2023 Author

jeremyevans May 17, 2023 Maintainer

timuckun May 11, 2023 Author

timuckun May 11, 2023 Author

timuckun May 11, 2023 Author

jeremyevans May 11, 2023 Maintainer

timuckun May 12, 2023 Author

jeremyevans Jun 2, 2023 Maintainer

timuckun May 12, 2023 Author

timuckun May 20, 2023 Author

jeremyevans Jun 2, 2023 Maintainer

timuckun Jun 5, 2023 Author

jeremyevans Jun 5, 2023 Maintainer

timuckun Jun 7, 2023 Author

timuckun May 13, 2023 Author

jeremyevans May 17, 2023 Maintainer

timuckun May 18, 2023 Author

jeremyevans May 18, 2023 Maintainer

timuckun May 20, 2023 Author

timuckun
Apr 11, 2023

Replies: 10 comments 16 replies

jeremyevans
Apr 11, 2023
Maintainer

timuckun
Apr 12, 2023
Author

jeremyevans Apr 12, 2023
Maintainer

timuckun
Apr 20, 2023
Author

timuckun
May 10, 2023
Author

jeremyevans May 17, 2023
Maintainer

timuckun
May 11, 2023
Author

timuckun
May 11, 2023
Author

timuckun
May 11, 2023
Author

jeremyevans
May 11, 2023
Maintainer

timuckun May 12, 2023
Author

jeremyevans Jun 2, 2023
Maintainer

timuckun
May 12, 2023
Author

timuckun May 20, 2023
Author

jeremyevans Jun 2, 2023
Maintainer

timuckun Jun 5, 2023
Author

jeremyevans Jun 5, 2023
Maintainer

timuckun Jun 7, 2023
Author

timuckun
May 13, 2023
Author

jeremyevans May 17, 2023
Maintainer

timuckun May 18, 2023
Author

jeremyevans May 18, 2023
Maintainer

timuckun May 20, 2023
Author