-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: optimise queryset of ES #4265
Conversation
course_discovery/apps/core/utils.py
Outdated
def __init__(self, queryset, model): | ||
# This is necessary to act like Django ORM Queryset | ||
self.model = model | ||
|
||
self.queryset = queryset | ||
self._select_related_lookups = () | ||
self._prefetch_related_lookups = () | ||
|
||
def prefetch_related(self, *lookups): | ||
"""Same as QuerySet.prefetch_related()""" | ||
clone = self._chain() | ||
if lookups == (None,): | ||
clone._prefetch_related_lookups = () | ||
else: | ||
clone._prefetch_related_lookups += lookups | ||
return clone | ||
|
||
def select_related(self, *lookups): | ||
"""Will work same as .prefetch_related()""" | ||
clone = self._chain() | ||
if lookups == (None,): | ||
clone._select_related_lookups = () | ||
else: | ||
clone._select_related_lookups += lookups | ||
return clone | ||
|
||
def _chain(self): | ||
clone = self.__class__(queryset=self.queryset, model=self.model) | ||
clone._select_related_lookups = self._select_related_lookups | ||
clone._prefetch_related_lookups = self._prefetch_related_lookups | ||
return clone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these additions needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make SearchQuerySetWrapper
coherent so that it can be used in all the use cases(in case of ES query and without it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide some links/references as this is a specific use-case and the changes need some further explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea of the implemented logic here is taken from the django's source code i.e. from its ORM Queryset class.By following this approach select_related
and prefetch_related
logic is not only encapsulated inside the SearchQuerySetWrapper
class but also equally applied on the search results from Elasticsearch.
yield result.object | ||
results = [r.object for r in self.queryset] | ||
|
||
# Both select_related & prefetch_related will act as prefetch_related |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should both select and prefetch be treated as prefetch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of select_related
query is being needed to execute on the db but in this particular case(search results are already gathered from ES). So, the query is not needed to be executed that is why prefetch_related
and select_related
should act as prefetch_related
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this means every relation that can be fetch via join is now fetched by IN and joined internally by Django -- this will increase DB calls overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think so, it will target only SearchQuerySetWrapper
queryset which would be created only in case of ES query.
9536f2c
to
0d83c2d
Compare
976fa23
to
ba5ace0
Compare
@@ -2002,6 +2002,7 @@ def prefetch_queryset(cls, partner, queryset=None): | |||
'degree__costs', | |||
'degree__deadlines', | |||
'curricula', | |||
'labels', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added that in #4276, alongside other N+1 fixups.
|
||
def __iter__(self): | ||
for result in self.qs: | ||
yield result.object | ||
results = [r.object for r in self.queryset] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes queryset to be not an iterator, thus losing the perf benefit. The object will be evaluated and will result in hitting the DB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the subsequent lines below, we need to prefetch_related_objects
that demands to pass the results ultimately gives the performance benefit.
|
||
def __getattr__(self, item): | ||
try: | ||
return super().__getattr__(item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was the above check, which gets attribute value from the class instead of the queryset, removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think return getattr(self.queryset, item)
will serve the purpose so thats why, removed,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we certain about this, just in case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it will not break, as model object is treated separately now so no need to call super()
return clone | ||
|
||
def select_related(self, *lookups): | ||
"""Will work same as .prefetch_related()""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the overridden method of select_related
working similarly to prefetch_related
and having the same logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of select_related query is being needed to execute on the db but in this particular case(search results are already gathered from ES). So, the query is not needed to be executed that is why prefetch_related and select_related should act as prefetch_related.
course_discovery/apps/core/utils.py
Outdated
def _chain(self): | ||
clone = self.__class__(queryset=self.queryset, model=self.model) | ||
clone._select_related_lookups = self._select_related_lookups | ||
clone._prefetch_related_lookups = self._prefetch_related_lookups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these lookup variables different from those above in the __init__
method? And how they are working?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they are the same, here a clone object is created with prefetch_queryset
, nothing special.
ba5ace0
to
0c86ca7
Compare
175ccd2
to
736bdd5
Compare
single_value = isinstance(key, int) | ||
|
||
clone = self._chain() | ||
clone.queryset = self.queryset[slice(key, key + 1) if single_value else key] | ||
|
||
if single_value: | ||
return list(clone)[0] | ||
|
||
return clone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retrieval of single val and slice object, similar to the python list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it so different from prior implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it has a minor difference in terms of retrieval of a single value.Previously it was returning an object and now it is returning just a single value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it single value of an object or is it still an object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously __getitem__
was returning either an object from a queryset or an object of SearchQuerySetWrapper
. but now there is no need to cater both of them separately as this refactoring has introduced a coherence. Now, queryset is taking care of it independently.
queryset = SearchQuerySetWrapper( | ||
CourseRun.search(q).filter('term', partner=partner.short_code), | ||
model=queryset.model | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, SearchQuerySetWrapper was being returned but now, it is serializer prefetched_queryset. Any unexpected consequences for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not foreseeing any consequence(s) about this implementation.Technically, it is more semantically related to the Django ORM.
By incorporating this approach, additional benefits like prefetch_related
and select_related
can be implemented for ES queries.
736bdd5
to
6310054
Compare
It will optimise the queryset created against the query of ES. PROD-3667
6310054
to
50d4cf1
Compare
After making the above changes query count is reduced to its half i.e. Before this change query count was around ~85 queries and after this change it is dropped to ~45 queries.
Following are the screenshots:
Before this change:
After this change:
The above statistics can be observed by hitting the rest api of the course runs with query param so that it will go the Elasticsearch.
Steps to Reproduce:
Follow the above steps first on master branch and then on this PR to notice the drop in query count.
Note: One can also observe a drop in the response time of upto ~400ms. Also set the
edit_mode
toFalse
directly in the codebase so that rest api with ES query can be loaded efficiently.edit_mode=False