-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FastAPI Redis Caching #4195
Conversation
Passing run #4616 ↗︎
Details:
Review all test suite changes for PR #4195 ↗︎ |
Codecov ReportAll modified lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #4195 +/- ##
==========================================
+ Coverage 87.71% 87.73% +0.01%
==========================================
Files 331 331
Lines 20870 20878 +8
Branches 2708 2708
==========================================
+ Hits 18307 18317 +10
+ Misses 2096 2094 -2
Partials 467 467
☔ View full report in Codecov by Sentry. |
At this point I've got the cache wired up, but it isn't making a difference. It seems like something is misconfigured or the endpoint isn't set up in a way that is cache-friendly. Investigating... |
the latest commit features a stripped down privacy-experience endpoint but with proper caching that works as expected. now the challenge is to keep that endpoint at feature parity while also adding the custom caching |
as it currently stands, the caching is working but failing some tests due to the fact that the tests are adding new data and the cache is not being busted @pattisdr how often do we expect these tables to be updated? I'm not sure of the context here, but maybe it isn't actually very feasible to use a simple caching method here since it could be hard to know when to actually bust the cache. |
@ThomasLaPiana we ran into a similar issue in the original draft for TCF Experiences trying to use lru caching and realized it would take some more care to get the arguments right to bust the cache. It wasn't a quickly solved thing at the time when we were trying to get the POC out the door so we pulled it out. I would think these tables could get updated frequently: we'll have automated pipelines that update systems and privacy declarations at least on a weekly cadence. Big picture, additions or deletions and updates to certain fields on systems or privacy declarations can potentially cause big changes to what is returned in the Experience. For non-TCF Experiences, in addition to the above, changes to Privacy Notices or Experience Config (Experience copy/language) can cause the Experience to need to be re-built. |
If we really want caching here, I think we could do the following:
What do you think about adding these two solutions? Caching is basically the hardest thing to get right so if we decide to forego this and instead focus on async database calls and potentially parallel calls we can go that route as well. Caching is a nice-to-have feature but also risks giving us major headaches so it's worth weighing the pros and cons |
@ThomasLaPiana I really like the addition of that second idea, as part of the overall solution, a request header that forces a rebuild - that would work great for unit tests. A thirty minute limit on the cache too probably would catch most things In theory I'd like to have caching because while this is something that can change frequently, it's likely not changing daily, and that's a lot of us re-querying the same thing. Caching was something we'd talked about having originally - it seems preferable, compared to the other extreme where we're doing complex change tracking to rebuild the experiences over time as multiple other resources change. But if you thought you could get similar orders of magnitude performance improvements with things like async database calls, that might be worth exploring first given the risks here of showing a customer an outdated experience. |
If you think these are good solutions, it's worth continuing with! Even with Async database calls it won't cut down on processing time, a long/complex query is still just as long/complex, it would only mean less blocking time. In that case, I'll continue with the work here and clean it up to get it production-ready 🙂 thank you for your input! I'll tag you again when its ready for a review |
…g whether or not the cache was a hit or miss
# This is structured to look like a paginated result to minimize impact from | ||
# the caching changes | ||
api_result = {"items": results, "total": len(results)} | ||
PRIVACY_EXPERIENCE_CACHE[cache_hash] = api_result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to comment early here. I see you're caching the output of the entire endpoint: now the performance gains make more sense. This is a neat strategy, I'm just more concerned about accuracy now. I had been assuming we'd instead cache some time-consuming individual pieces that go into the experience like the output of get_tcf_contents
.
For example, fides_user_device_id
is a unique user identifier. If present, we supplement the experience with previously-saved preferences for that user. So if the user saves new preferences, this experience would still return their outdated preferences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense! Without knowledge of what is actually going on here I can't make those fine-grained caching decisions.
So we can either handle this on the front-end using the header that resets the cache, or we can use a more fine-grained caching method in which I'd need some more guidance on what is safe to cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
real quick, something like the output of get_tcf_contents
which is the bulk of TCF experiences and PrivacyExperience.get_related_privacy_notices which is for non-TCF experiences but perhaps the latter refactored into a separate method that isn't taking fides_user_device_id in, that is added separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note here, I'm moving a great deal of privacy preferences / privacy experiences endpoints, supporting methods, TCF experiences, etc. to Fidesplus -
https://ethyca.atlassian.net/browse/PROD-1258
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for the heads up! In that case I'll move this work over there. I haven't been able to circle back to this yet but Adam already figured it out for some other endpoints we have :)
this is being worked on elsewhere by @adamsachs ! |
Closes #4171
Description Of Changes
This adds caching for FastAPI endpoints, via our existing Redis instance
Docs for the package here: https://github.com/long2ice/fastapi-cache
Code Changes
redis
versionprivacy-experience
endpointX-Endpoint-Cache
that contains eitherHIT
orMISS
bust-endpoint-cache
that clears theprivacy-experience
endpoint cacheSteps to Confirm
nox -s load_tests
should show significantly improved results for theprivacy-experience
endpointsPre-Merge Checklist
CHANGELOG.md