Skip to content
This repository has been archived by the owner on Sep 28, 2022. It is now read-only.

Multiple CountOfs can sometime collide and result in a heinously slow query #62

Open
pmg103 opened this issue Sep 17, 2020 · 4 comments

Comments

@pmg103
Copy link
Contributor

pmg103 commented Sep 17, 2020

I think this is our basic underlying problem https://stackoverflow.com/questions/29195299/why-is-this-django-1-6-annotate-count-so-slow

It's almost like a sort of django/postgres mismatch rather than an SSM issue per se -- however SSM allows you to very easily end up with this situation with no obvious way out.

Possible solutions could be:

  • warn when multiple CountOfs are combined
  • error when multiple CountOfs are combined
  • implement countof by fetching a values_list of related IDs and calling len() on that list if there's more than one

(Something like a generic version of this:

class CountOfUsers(SerializationSpecPlugin):
    def modify_queryset(self, queryset):
        return queryset.prefetch_related(Prefetch(
            'users',
            queryset=ActivityUser.objects.only('id', 'activity_id'),
            to_attr='user_ids'
        ))

    def get_value(self, instance):
        return len(instance.user_ids)

# ... 

    serialization_spec = [
        # ...
        {'num_users': CountOfUsers()},

)

  • Or some combination of the above
@j4mie
Copy link
Member

j4mie commented Sep 17, 2020

What about using the subquery approach in the answer to that SO question? That should work on any currently-supported version of Django.

I think the problem with trying to figure out a generic way to do this is that it's going to depend on the size of your N. If you have a small number of things in your main queryset you might even be quicker doing

    @zen_queries.queries_dangerously_enabled()
    def get_value(self, instance):
       return instance.whatever.count()

If your number of related items is big, I suspect your prefetch-then-len() is going to be pretty rubbish.

@pmg103
Copy link
Contributor Author

pmg103 commented Sep 17, 2020

I agree. Could the subquery approach be implemented as a SerializationSpecPlugin? 🤔

@j4mie
Copy link
Member

j4mie commented Sep 17, 2020

I don't see why not. If it isn't currently possible, it definitely should be.

@RealOrangeOne
Copy link
Contributor

Subquery like that is reasonably simple to implement, however gets harder if you needed to count a nested relation, especially if you need the DISTINCT count

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants