perf: optimise queryset of ES #4265

uzairr · 2024-02-09T18:52:30Z

After making the above changes query count is reduced to its half i.e. Before this change query count was around ~85 queries and after this change it is dropped to ~45 queries.
Following are the screenshots:

Before this change:

After this change:

The above statistics can be observed by hitting the rest api of the course runs with query param so that it will go the Elasticsearch.

Steps to Reproduce:

Go to http://localhost:18381/api/v1/course_runs/?q=*
Refresh 2 to 3 times so that caches are populated to make a stable response.
Observe the Django tool bar, specifically Time and SQL queries.

Follow the above steps first on master branch and then on this PR to notice the drop in query count.

Note: One can also observe a drop in the response time of upto ~400ms. Also set the edit_mode to False directly in the codebase so that rest api with ES query can be loaded efficiently.

file path: course_discovery/apps/api/v1/views/course_runs.py
after line 85, explicitly set edit_mode=False

DawoudSheraz · 2024-02-14T12:19:12Z

course_discovery/apps/core/utils.py

+    def __init__(self, queryset, model):
+        # This is necessary to act like Django ORM Queryset
+        self.model = model
+
+        self.queryset = queryset
+        self._select_related_lookups = ()
+        self._prefetch_related_lookups = ()
+
+    def prefetch_related(self, *lookups):
+        """Same as QuerySet.prefetch_related()"""
+        clone = self._chain()
+        if lookups == (None,):
+            clone._prefetch_related_lookups = ()
+        else:
+            clone._prefetch_related_lookups += lookups
+        return clone
+
+    def select_related(self, *lookups):
+        """Will work same as .prefetch_related()"""
+        clone = self._chain()
+        if lookups == (None,):
+            clone._select_related_lookups = ()
+        else:
+            clone._select_related_lookups += lookups
+        return clone
+
+    def _chain(self):
+        clone = self.__class__(queryset=self.queryset, model=self.model)
+        clone._select_related_lookups = self._select_related_lookups
+        clone._prefetch_related_lookups = self._prefetch_related_lookups
+        return clone


Why are these additions needed?

To make SearchQuerySetWrapper coherent so that it can be used in all the use cases(in case of ES query and without it).

Please provide some links/references as this is a specific use-case and the changes need some further explanation.

Idea of the implemented logic here is taken from the django's source code i.e. from its ORM Queryset class.By following this approach select_related and prefetch_related logic is not only encapsulated inside the SearchQuerySetWrapper class but also equally applied on the search results from Elasticsearch.

https://github.com/django/django/blob/977d25416954a72ad100b01762078bf1ceb89a63/django/db/models/query.py#L1599

DawoudSheraz · 2024-02-14T12:19:38Z

course_discovery/apps/core/utils.py

-            yield result.object
+        results = [r.object for r in self.queryset]
+
+        # Both select_related & prefetch_related will act as prefetch_related


Why should both select and prefetch be treated as prefetch?

In case of select_related query is being needed to execute on the db but in this particular case(search results are already gathered from ES). So, the query is not needed to be executed that is why prefetch_related and select_related should act as prefetch_related.

https://github.com/django/django/blob/977d25416954a72ad100b01762078bf1ceb89a63/django/db/models/query.py#L523

But this means every relation that can be fetch via join is now fetched by IN and joined internally by Django -- this will increase DB calls overall.

i don't think so, it will target only SearchQuerySetWrapper queryset which would be created only in case of ES query.

DawoudSheraz · 2024-02-26T12:24:09Z

course_discovery/apps/api/serializers.py

@@ -2002,6 +2002,7 @@ def prefetch_queryset(cls, partner, queryset=None):
            'degree__costs',
            'degree__deadlines',
            'curricula',
+            'labels',


I added that in #4276, alongside other N+1 fixups.

DawoudSheraz · 2024-02-26T12:31:08Z

course_discovery/apps/core/utils.py


    def __iter__(self):
-        for result in self.qs:
-            yield result.object
+        results = [r.object for r in self.queryset]


This changes queryset to be not an iterator, thus losing the perf benefit. The object will be evaluated and will result in hitting the DB.

In the subsequent lines below, we need to prefetch_related_objects that demands to pass the results ultimately gives the performance benefit.

https://github.com/django/django/blob/977d25416954a72ad100b01762078bf1ceb89a63/django/db/models/query.py#L524

AfaqShuaib09 · 2024-02-26T12:48:52Z

course_discovery/apps/core/utils.py


    def __getattr__(self, item):
-        try:
-            return super().__getattr__(item)


Why was the above check, which gets attribute value from the class instead of the queryset, removed?

I think return getattr(self.queryset, item) will serve the purpose so thats why, removed,

Are we certain about this, just in case?

yes it will not break, as model object is treated separately now so no need to call super()

AfaqShuaib09 · 2024-02-26T12:49:15Z

course_discovery/apps/core/utils.py

+        return clone
+
+    def select_related(self, *lookups):
+        """Will work same as .prefetch_related()"""


Why is the overridden method of select_related working similarly to prefetch_related and having the same logic?

In case of select_related query is being needed to execute on the db but in this particular case(search results are already gathered from ES). So, the query is not needed to be executed that is why prefetch_related and select_related should act as prefetch_related.

https://github.com/django/django/blob/977d25416954a72ad100b01762078bf1ceb89a63/django/db/models/query.py#L523

AfaqShuaib09 · 2024-02-26T12:50:22Z

course_discovery/apps/core/utils.py

+    def _chain(self):
+        clone = self.__class__(queryset=self.queryset, model=self.model)
+        clone._select_related_lookups = self._select_related_lookups
+        clone._prefetch_related_lookups = self._prefetch_related_lookups


Are these lookup variables different from those above in the __init__ method? And how they are working?

No, they are the same, here a clone object is created with prefetch_queryset, nothing special.

DawoudSheraz · 2024-03-18T09:39:05Z

course_discovery/apps/core/utils.py

+        single_value = isinstance(key, int)
+
+        clone = self._chain()
+        clone.queryset = self.queryset[slice(key, key + 1) if single_value else key]
+
+        if single_value:
+            return list(clone)[0]
+
+        return clone


what's going on here?

retrieval of single val and slice object, similar to the python list.

Why is it so different from prior implementation?

it has a minor difference in terms of retrieval of a single value.Previously it was returning an object and now it is returning just a single value.

Is it single value of an object or is it still an object?

Previously __getitem__ was returning either an object from a queryset or an object of SearchQuerySetWrapper. but now there is no need to cater both of them separately as this refactoring has introduced a coherence. Now, queryset is taking care of it independently.

DawoudSheraz · 2024-03-18T09:40:18Z

course_discovery/apps/api/v1/views/course_runs.py

+            queryset = SearchQuerySetWrapper(
+                CourseRun.search(q).filter('term', partner=partner.short_code),
+                model=queryset.model
+            )


Previously, SearchQuerySetWrapper was being returned but now, it is serializer prefetched_queryset. Any unexpected consequences for this?

I am not foreseeing any consequence(s) about this implementation.Technically, it is more semantically related to the Django ORM.
By incorporating this approach, additional benefits like prefetch_related and select_related can be implemented for ES queries.

It will optimise the queryset created against the query of ES. PROD-3667

DawoudSheraz reviewed Feb 14, 2024

View reviewed changes

uzairr force-pushed the optimise-courserun-listing branch from 9536f2c to 0d83c2d Compare February 15, 2024 11:44

uzairr force-pushed the optimise-courserun-listing branch 2 times, most recently from 976fa23 to ba5ace0 Compare February 26, 2024 11:50

DawoudSheraz reviewed Feb 26, 2024

View reviewed changes

AfaqShuaib09 reviewed Feb 26, 2024

View reviewed changes

uzairr force-pushed the optimise-courserun-listing branch from ba5ace0 to 0c86ca7 Compare February 28, 2024 08:31

uzairr force-pushed the optimise-courserun-listing branch 3 times, most recently from 175ccd2 to 736bdd5 Compare March 18, 2024 08:24

DawoudSheraz reviewed Mar 18, 2024

View reviewed changes

uzairr force-pushed the optimise-courserun-listing branch from 736bdd5 to 6310054 Compare March 20, 2024 11:13

DawoudSheraz approved these changes Mar 28, 2024

View reviewed changes

AfaqShuaib09 approved these changes Apr 1, 2024

View reviewed changes

uzairr added 2 commits April 1, 2024 15:00

perf: optimise queryset of ES

786e245

It will optimise the queryset created against the query of ES. PROD-3667

test: add tests for prefetch and select related qs

50d4cf1

uzairr force-pushed the optimise-courserun-listing branch from 6310054 to 50d4cf1 Compare April 1, 2024 10:00

uzairr merged commit 164a6f5 into master Apr 1, 2024
14 checks passed

uzairr deleted the optimise-courserun-listing branch April 1, 2024 10:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimise queryset of ES #4265

perf: optimise queryset of ES #4265

uzairr commented Feb 9, 2024 •

edited

Loading

DawoudSheraz Feb 14, 2024

uzairr Feb 15, 2024

DawoudSheraz Feb 16, 2024

uzairr Feb 26, 2024

DawoudSheraz Feb 14, 2024

uzairr Feb 26, 2024

DawoudSheraz Feb 26, 2024

uzairr Feb 28, 2024

DawoudSheraz Feb 26, 2024

DawoudSheraz Feb 26, 2024 •

edited

Loading

uzairr Feb 28, 2024

AfaqShuaib09 Feb 26, 2024

uzairr Feb 28, 2024

DawoudSheraz Mar 18, 2024

uzairr Mar 18, 2024

AfaqShuaib09 Feb 26, 2024

uzairr Feb 28, 2024

AfaqShuaib09 Feb 26, 2024

uzairr Feb 28, 2024

DawoudSheraz Mar 18, 2024

uzairr Mar 18, 2024

DawoudSheraz Mar 18, 2024

uzairr Mar 18, 2024

DawoudSheraz Mar 26, 2024

uzairr Mar 28, 2024

DawoudSheraz Mar 18, 2024

uzairr Mar 18, 2024

perf: optimise queryset of ES #4265

perf: optimise queryset of ES #4265

Conversation

uzairr commented Feb 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DawoudSheraz Feb 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uzairr commented Feb 9, 2024 •

edited

Loading

DawoudSheraz Feb 26, 2024 •

edited

Loading