-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
identify and prefetch N+1 queries in search/all for learner pathways #4488
Conversation
7e0914f
to
9922498
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- commit type should be perf, not chore
- Add a relevant unit test to showcase the decrease in query count
0fb9ea2
to
16f9714
Compare
16f9714
to
3a0aab8
Compare
@@ -87,8 +82,7 @@ def get_card_image_url(self, step): | |||
return program.card_image_url | |||
|
|||
def get_courses(self, obj): | |||
excluded_restriction_types = get_excluded_restriction_types(self.context['request']) | |||
return obj.get_linked_courses_and_course_runs(excluded_restriction_types=excluded_restriction_types) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these removed from serializer? This should be independent from document changes.
router.register(r'learner-pathway-course', views.LearnerPathwayCourseViewSet, basename='learner-pathway-course') | ||
router.register(r'learner-pathway-program', views.LearnerPathwayProgramViewSet, basename='learner-pathway-program') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The basename is automatically generated from the queryset
attribute of the viewset if it exists. However, in our case, we removed the queryset
attribute and added a get_queryset
method instead. Therefore, we need to explicitly set the basename to avoid any errors.
Prefetch( | ||
'course__course_runs', | ||
queryset=CourseRun.objects.filter( | ||
status=CourseRunStatus.Published |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we using Published status as the only check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same filter that was being used in get_linked_courses_and_course_runs
and get_course_runs
https://github.com/openedx/course-discovery/blob/master/course_discovery/apps/learner_pathway/models.py#L313
@@ -299,23 +299,15 @@ def get_skills(self) -> [str]: | |||
|
|||
return program_skills | |||
|
|||
def get_linked_courses_and_course_runs(self, excluded_restriction_types=None) -> [dict]: | |||
def get_linked_courses_and_course_runs(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question, why is the being removed from here considering this is a model method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This model method is used exclusively in the LearnerPathwayProgramSerializer
.
The LearnerPathwayProgramSerializer
is used by the LearnerPathwayProgramViewSet
and the LearnerPathwayStepSerializer
, which is then used in the LearnerPathwaySearchDocumentSerializer
.
In this model method, the use of .filter
and .exclude
bypasses the prefetch cache, causing additional database queries and leading to an N+1 query problem. To address this, the filtering logic has been moved to the get_queryset
method of both LearnerPathwayProgramViewSet
and LearnerPathwayDocument
.
3a0aab8
to
12c1ed1
Compare
12c1ed1
to
79af586
Compare
The changes look good, but are quite dense. I'll have another look at them before approval. Can you please satisfy codecov in the meanwhile? Also, ref
Does the current implementation not take care of this? I'd expect the filtering at the serializer level (in |
@zawan-ila You're correct—the current implementation already handles this because we're using these serializers in /search/all. I've updated the PR description. |
520cdad
to
a6b9a63
Compare
a6b9a63
to
2004662
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on this 🎉
course_discovery/apps/learner_pathway/api/v1/tests/test_views.py
Outdated
Show resolved
Hide resolved
course_discovery/apps/learner_pathway/api/v1/tests/test_views.py
Outdated
Show resolved
Hide resolved
2004662
to
7f67316
Compare
if include_learner_pathways: | ||
expected_result_count = pathways.count() | ||
expected_query_count = 8 | ||
else: | ||
expected_result_count = 0 | ||
expected_query_count = 4 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: instead of if-else, you can move the counts to ddt as well. As for pathways.count(), it would be better to have static explicit values instead of comparing against DB count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated ✅
@@ -245,3 +266,283 @@ def test_learner_pathway_uuids_endpoint(self, query_params, response): | |||
learner_pathway_uuids_url = f'/api/v1/learner-pathway/uuids/?{urlencode(query_params)}' | |||
api_response = self.client.get(learner_pathway_uuids_url) | |||
assert api_response.json() == response | |||
|
|||
|
|||
@mark.django_db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wondering if this is still needed as the test suite is using Django's TestCase, not unittest TestCase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated ✅
).values('key') | ||
) | ||
courses.append({"key": course.key, "course_runs": course_runs}) | ||
course_runs = [{'key': course_run.key} for course_run in course.course_runs.all()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why can't we use .values() here like before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7f67316
to
e2fcaa7
Compare
PROD-3888
TL;DR
This PR addresses the need for optimized query fetching in the
/api/v1/search/all/?include_learner_pathways=true
endpoint by implementingprefetch_related
forLearnerPathway
data.LearnerPathwayViewSet
LearnerPathwayStepViewSet
LearnerPathwayCourseViewSet
LearnerPathwayProgramViewSet
learnerpathwayblock_set
Details
The
get_linked_courses_and_course_runs
model method is only used inLearnerPathwayProgramSerializer
.The
LearnerPathwayProgramSerializer
is used by theLearnerPathwayProgramViewSet
and theLearnerPathwayStepSerializer
, which is then used in theLearnerPathwaySearchDocumentSerializer
.Even if we apply
prefetch_related
in theLearnerPathwayDocument's
get_queryset
, the filtering inget_linked_courses_and_course_runs
bypasses the prefetched results and runs additional queries.To solve this issue: I have moved the filtering logic to
LearnerPathwayProgramViewSet's
get_queryset
andLearnerPathwayDocument's``get_queryset
methodSince we are already overriding the
LearnerPathwayProgramViewSet's
get_queryset
, I have also fixed the existing N+1 issue in the viewset.Similarly for courses,
LearnerPathwayCourseMinimalSerializer's
get_course_runs
method was causing additional queries because of filters.LearnerPathwayCourseMinimalSerializer
is used byLearnerPathwayCourseSerializer
which is used by
LearnerPathwayCourseViewSet
andLearnerPathwayStepSerializer
(used in theLearnerPathwaySearchDocumentSerializer
)To solve this issue: I have moved the filtering logic to
LearnerPathwayCourseViewSet's
get_queryset
and LearnerPathwayDocument'sget_queryset
methodSince we are already overriding the
LearnerPathwayCourseViewSet's
get_queryset
, I have also fixed the existing N+1 issue in the viewset.