Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

identify and prefetch N+1 queries in search/all for learner pathways #4488

Merged
merged 5 commits into from
Nov 21, 2024

Conversation

hamza-56
Copy link
Contributor

@hamza-56 hamza-56 commented Nov 12, 2024

PROD-3888

TL;DR

This PR addresses the need for optimized query fetching in the /api/v1/search/all/?include_learner_pathways=true endpoint by implementing prefetch_related for LearnerPathway data.

  1. Fixes N+1 issues in LearnerPathwayViewSet
  2. Fixes N+1 issues in LearnerPathwayStepViewSet
  3. Fixes N+1 issues in LearnerPathwayCourseViewSet
  4. Fixed N+1 issues in LearnerPathwayProgramViewSet
  5. Removes unnecessary eager loading of learnerpathwayblock_set

Details

The get_linked_courses_and_course_runs model method is only used in LearnerPathwayProgramSerializer.
The LearnerPathwayProgramSerializer is used by the LearnerPathwayProgramViewSet and the LearnerPathwayStepSerializer, which is then used in the LearnerPathwaySearchDocumentSerializer.
Even if we apply prefetch_related in the LearnerPathwayDocument's get_queryset, the filtering in get_linked_courses_and_course_runs bypasses the prefetched results and runs additional queries.

To solve this issue: I have moved the filtering logic to LearnerPathwayProgramViewSet's get_queryset and LearnerPathwayDocument's``get_queryset method

Since we are already overriding the LearnerPathwayProgramViewSet's get_queryset, I have also fixed the existing N+1 issue in the viewset.

Similarly for courses, LearnerPathwayCourseMinimalSerializer's get_course_runs method was causing additional queries because of filters.
LearnerPathwayCourseMinimalSerializer is used by LearnerPathwayCourseSerializer
which is used by LearnerPathwayCourseViewSet and LearnerPathwayStepSerializer (used in the LearnerPathwaySearchDocumentSerializer)
To solve this issue: I have moved the filtering logic to LearnerPathwayCourseViewSet's get_queryset and LearnerPathwayDocument's get_queryset method
Since we are already overriding the LearnerPathwayCourseViewSet's get_queryset, I have also fixed the existing N+1 issue in the viewset.

@hamza-56 hamza-56 self-assigned this Nov 12, 2024
@hamza-56 hamza-56 marked this pull request as ready for review November 12, 2024 15:06
@hamza-56 hamza-56 requested review from AfaqShuaib09, zawan-ila, DawoudSheraz and Ali-D-Akbar and removed request for AfaqShuaib09 November 12, 2024 15:06
Copy link
Contributor

@DawoudSheraz DawoudSheraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • commit type should be perf, not chore
  • Add a relevant unit test to showcase the decrease in query count

@hamza-56 hamza-56 force-pushed the hamza/PROD-3888 branch 2 times, most recently from 0fb9ea2 to 16f9714 Compare November 15, 2024 00:56
@@ -87,8 +82,7 @@ def get_card_image_url(self, step):
return program.card_image_url

def get_courses(self, obj):
excluded_restriction_types = get_excluded_restriction_types(self.context['request'])
return obj.get_linked_courses_and_course_runs(excluded_restriction_types=excluded_restriction_types)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these removed from serializer? This should be independent from document changes.

Comment on lines +9 to +10
router.register(r'learner-pathway-course', views.LearnerPathwayCourseViewSet, basename='learner-pathway-course')
router.register(r'learner-pathway-program', views.LearnerPathwayProgramViewSet, basename='learner-pathway-program')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basename is automatically generated from the queryset attribute of the viewset if it exists. However, in our case, we removed the queryset attribute and added a get_queryset method instead. Therefore, we need to explicitly set the basename to avoid any errors.

Prefetch(
'course__course_runs',
queryset=CourseRun.objects.filter(
status=CourseRunStatus.Published
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we using Published status as the only check here?

Copy link
Contributor Author

@hamza-56 hamza-56 Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -299,23 +299,15 @@ def get_skills(self) -> [str]:

return program_skills

def get_linked_courses_and_course_runs(self, excluded_restriction_types=None) -> [dict]:
def get_linked_courses_and_course_runs(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question, why is the being removed from here considering this is a model method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This model method is used exclusively in the LearnerPathwayProgramSerializer.

The LearnerPathwayProgramSerializer is used by the LearnerPathwayProgramViewSet and the LearnerPathwayStepSerializer, which is then used in the LearnerPathwaySearchDocumentSerializer.

In this model method, the use of .filter and .exclude bypasses the prefetch cache, causing additional database queries and leading to an N+1 query problem. To address this, the filtering logic has been moved to the get_queryset method of both LearnerPathwayProgramViewSet and LearnerPathwayDocument.

@zawan-ila
Copy link
Contributor

The changes look good, but are quite dense. I'll have another look at them before approval. Can you please satisfy codecov in the meanwhile? Also, ref

Excludes Learnerpathway course_runs based on excluded_restriction_types in /api/v1/search/all

Does the current implementation not take care of this? I'd expect the filtering at the serializer level (in get_courses and get_course_runs) to have taken care of this. I believe the search/all endpoint uses these serializers too.

@hamza-56
Copy link
Contributor Author

The changes look good, but are quite dense. I'll have another look at them before approval. Can you please satisfy codecov in the meanwhile? Also, ref

Excludes Learnerpathway course_runs based on excluded_restriction_types in /api/v1/search/all

Does the current implementation not take care of this? I'd expect the filtering at the serializer level (in get_courses and get_course_runs) to have taken care of this. I believe the search/all endpoint uses these serializers too.

@zawan-ila You're correct—the current implementation already handles this because we're using these serializers in /search/all. I've updated the PR description.

@hamza-56 hamza-56 force-pushed the hamza/PROD-3888 branch 2 times, most recently from 520cdad to a6b9a63 Compare November 20, 2024 01:39
Copy link
Contributor

@zawan-ila zawan-ila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this 🎉

Comment on lines 776 to 782
if include_learner_pathways:
expected_result_count = pathways.count()
expected_query_count = 8
else:
expected_result_count = 0
expected_query_count = 4

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: instead of if-else, you can move the counts to ddt as well. As for pathways.count(), it would be better to have static explicit values instead of comparing against DB count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated ✅

@@ -245,3 +266,283 @@ def test_learner_pathway_uuids_endpoint(self, query_params, response):
learner_pathway_uuids_url = f'/api/v1/learner-pathway/uuids/?{urlencode(query_params)}'
api_response = self.client.get(learner_pathway_uuids_url)
assert api_response.json() == response


@mark.django_db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wondering if this is still needed as the test suite is using Django's TestCase, not unittest TestCase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated ✅

).values('key')
)
courses.append({"key": course.key, "course_runs": course_runs})
course_runs = [{'key': course_run.key} for course_run in course.course_runs.all()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why can't we use .values() here like before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.values() causes additional queries

Screenshot 2024-11-21 at 1 00 35 AM Screenshot 2024-11-21 at 12 59 57 AM

@hamza-56 hamza-56 merged commit 0bf2139 into master Nov 21, 2024
14 checks passed
@hamza-56 hamza-56 deleted the hamza/PROD-3888 branch November 21, 2024 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants