Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delete_all_using_limit_in_batches generates slow SQL query #15

Open
smily-ivan opened this issue Nov 13, 2023 · 2 comments
Open

delete_all_using_limit_in_batches generates slow SQL query #15

smily-ivan opened this issue Nov 13, 2023 · 2 comments

Comments

@smily-ivan
Copy link

smily-ivan commented Nov 13, 2023

This strategy generates following SQL query:

DELETE FROM "tasks" WHERE "tasks"."id" IN (SELECT "tasks"."id" FROM "tasks" WHERE (end_at < '2023-05-13 08:28:12.447024') ORDER BY "tasks"."end_at" ASC LIMIT 500)

For big tables sorting by custom field could cause:

ActiveRecord::QueryCanceled: PG::QueryCanceled: ERROR: canceling statement due to statement timeout

So, I think it's better to rewrite the something like:

      def call(collection)
        while true
          keys = collection.limit(batch_size).pluck(collection.primary_key)
          break if keys.blank?
          collection.where(collection.primary_key => keys).delete_all
        end
      end
@Azdaroth
Copy link
Member

@smily-ivan Are you sure that this is caused by ORDER and not WHERE statement? ordering by whatever you have in filters is the most optimal solution here, so it looks like the issue is not caused by ordering statement but rather a lack of index

@smily-ivan
Copy link
Author

Yes, I checked.

Here is explain with order
image

And same query without order:
image

We switched to delete_all strategy, it sorts by id (using index).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants