Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: optimise queryset of ES #4265

Merged
merged 2 commits into from
Apr 1, 2024
Merged

perf: optimise queryset of ES #4265

merged 2 commits into from
Apr 1, 2024

Conversation

uzairr
Copy link
Contributor

@uzairr uzairr commented Feb 9, 2024

After making the above changes query count is reduced to its half i.e. Before this change query count was around ~85 queries and after this change it is dropped to ~45 queries.
Following are the screenshots:

Before this change:
Screenshot 2024-02-15 at 4 00 50 PM

After this change:
Screenshot 2024-02-15 at 3 57 20 PM

The above statistics can be observed by hitting the rest api of the course runs with query param so that it will go the Elasticsearch.

Steps to Reproduce:

Follow the above steps first on master branch and then on this PR to notice the drop in query count.

Note: One can also observe a drop in the response time of upto ~400ms. Also set the edit_mode to False directly in the codebase so that rest api with ES query can be loaded efficiently.

  • file path: course_discovery/apps/api/v1/views/course_runs.py
  • after line 85, explicitly set edit_mode=False

Comment on lines 156 to 186
def __init__(self, queryset, model):
# This is necessary to act like Django ORM Queryset
self.model = model

self.queryset = queryset
self._select_related_lookups = ()
self._prefetch_related_lookups = ()

def prefetch_related(self, *lookups):
"""Same as QuerySet.prefetch_related()"""
clone = self._chain()
if lookups == (None,):
clone._prefetch_related_lookups = ()
else:
clone._prefetch_related_lookups += lookups
return clone

def select_related(self, *lookups):
"""Will work same as .prefetch_related()"""
clone = self._chain()
if lookups == (None,):
clone._select_related_lookups = ()
else:
clone._select_related_lookups += lookups
return clone

def _chain(self):
clone = self.__class__(queryset=self.queryset, model=self.model)
clone._select_related_lookups = self._select_related_lookups
clone._prefetch_related_lookups = self._prefetch_related_lookups
return clone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these additions needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make SearchQuerySetWrapper coherent so that it can be used in all the use cases(in case of ES query and without it).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide some links/references as this is a specific use-case and the changes need some further explanation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea of the implemented logic here is taken from the django's source code i.e. from its ORM Queryset class.By following this approach select_related and prefetch_related logic is not only encapsulated inside the SearchQuerySetWrapper class but also equally applied on the search results from Elasticsearch.

https://github.com/django/django/blob/977d25416954a72ad100b01762078bf1ceb89a63/django/db/models/query.py#L1599

yield result.object
results = [r.object for r in self.queryset]

# Both select_related & prefetch_related will act as prefetch_related
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should both select and prefetch be treated as prefetch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of select_related query is being needed to execute on the db but in this particular case(search results are already gathered from ES). So, the query is not needed to be executed that is why prefetch_related and select_related should act as prefetch_related.

https://github.com/django/django/blob/977d25416954a72ad100b01762078bf1ceb89a63/django/db/models/query.py#L523

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this means every relation that can be fetch via join is now fetched by IN and joined internally by Django -- this will increase DB calls overall.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think so, it will target only SearchQuerySetWrapper queryset which would be created only in case of ES query.

@uzairr uzairr force-pushed the optimise-courserun-listing branch from 9536f2c to 0d83c2d Compare February 15, 2024 11:44
@uzairr uzairr force-pushed the optimise-courserun-listing branch 2 times, most recently from 976fa23 to ba5ace0 Compare February 26, 2024 11:50
@@ -2002,6 +2002,7 @@ def prefetch_queryset(cls, partner, queryset=None):
'degree__costs',
'degree__deadlines',
'curricula',
'labels',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that in #4276, alongside other N+1 fixups.


def __iter__(self):
for result in self.qs:
yield result.object
results = [r.object for r in self.queryset]
Copy link
Contributor

@DawoudSheraz DawoudSheraz Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes queryset to be not an iterator, thus losing the perf benefit. The object will be evaluated and will result in hitting the DB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the subsequent lines below, we need to prefetch_related_objects that demands to pass the results ultimately gives the performance benefit.

https://github.com/django/django/blob/977d25416954a72ad100b01762078bf1ceb89a63/django/db/models/query.py#L524


def __getattr__(self, item):
try:
return super().__getattr__(item)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the above check, which gets attribute value from the class instead of the queryset, removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think return getattr(self.queryset, item) will serve the purpose so thats why, removed,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we certain about this, just in case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it will not break, as model object is treated separately now so no need to call super()

return clone

def select_related(self, *lookups):
"""Will work same as .prefetch_related()"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the overridden method of select_related working similarly to prefetch_related and having the same logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of select_related query is being needed to execute on the db but in this particular case(search results are already gathered from ES). So, the query is not needed to be executed that is why prefetch_related and select_related should act as prefetch_related.

https://github.com/django/django/blob/977d25416954a72ad100b01762078bf1ceb89a63/django/db/models/query.py#L523

def _chain(self):
clone = self.__class__(queryset=self.queryset, model=self.model)
clone._select_related_lookups = self._select_related_lookups
clone._prefetch_related_lookups = self._prefetch_related_lookups
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these lookup variables different from those above in the __init__ method? And how they are working?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are the same, here a clone object is created with prefetch_queryset, nothing special.

@uzairr uzairr force-pushed the optimise-courserun-listing branch from ba5ace0 to 0c86ca7 Compare February 28, 2024 08:31
@uzairr uzairr force-pushed the optimise-courserun-listing branch 3 times, most recently from 175ccd2 to 736bdd5 Compare March 18, 2024 08:24
Comment on lines +204 to +212
single_value = isinstance(key, int)

clone = self._chain()
clone.queryset = self.queryset[slice(key, key + 1) if single_value else key]

if single_value:
return list(clone)[0]

return clone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's going on here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retrieval of single val and slice object, similar to the python list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it so different from prior implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it has a minor difference in terms of retrieval of a single value.Previously it was returning an object and now it is returning just a single value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it single value of an object or is it still an object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously __getitem__ was returning either an object from a queryset or an object of SearchQuerySetWrapper. but now there is no need to cater both of them separately as this refactoring has introduced a coherence. Now, queryset is taking care of it independently.

Comment on lines +100 to +103
queryset = SearchQuerySetWrapper(
CourseRun.search(q).filter('term', partner=partner.short_code),
model=queryset.model
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, SearchQuerySetWrapper was being returned but now, it is serializer prefetched_queryset. Any unexpected consequences for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not foreseeing any consequence(s) about this implementation.Technically, it is more semantically related to the Django ORM.
By incorporating this approach, additional benefits like prefetch_related and select_related can be implemented for ES queries.

@uzairr uzairr force-pushed the optimise-courserun-listing branch from 736bdd5 to 6310054 Compare March 20, 2024 11:13
uzairr added 2 commits April 1, 2024 15:00
It will optimise the queryset created against the query of ES.

PROD-3667
@uzairr uzairr force-pushed the optimise-courserun-listing branch from 6310054 to 50d4cf1 Compare April 1, 2024 10:00
@uzairr uzairr merged commit 164a6f5 into master Apr 1, 2024
14 checks passed
@uzairr uzairr deleted the optimise-courserun-listing branch April 1, 2024 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants