-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving datamap performance #5601
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
src/fides/api/models/sql_models.py
Outdated
[ | ||
func.jsonb_array_elements_text( | ||
text( | ||
"jsonb_path_query(collections::jsonb, '$.** ? (@.data_categories != null).data_categories')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key change, the JSONB path returns all the data_categories
at all levels of nesting. The create_data_categories_property
function is used to create the dataset_data_categories
column property for System
and PrivacyDeclaration
datasets = relationship( | ||
"Dataset", | ||
primaryjoin="foreign(Dataset.fides_key)==any_(System.dataset_references)", | ||
lazy="selectin", | ||
uselist=True, | ||
viewonly=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No more datasets! This was only being used for the undeclared_data_categories
so we can get rid of it
src/fides/api/models/sql_models.py
Outdated
system_dataset_data_categories = set() | ||
for dataset in self.datasets: | ||
system_dataset_data_categories.update(dataset.field_data_categories) | ||
system_dataset_data_categories = set(self.dataset_data_categories) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the optimization in use, no more iterating over datasets
assert set( | ||
privacy_declaration_with_multiple_dataset_references.dataset_data_categories | ||
) == { | ||
"user.behavior", | ||
"user.unique_id", | ||
"user.contact.address.street", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out the privacy_declaration_with_multiple_dataset_references
fixture to see the different levels of nesting for the data categories
): | ||
assert ( | ||
privacy_declaration_with_dataset_references.undeclared_data_categories | ||
privacy_declaration_with_single_dataset_reference.undeclared_data_categories |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of the tests for undeclared_data_categories
were already here
fides Run #11487
Run Properties:
|
Project |
fides
|
Branch Review |
refs/pull/5601/merge
|
Run status |
Passed #11487
|
Run duration | 00m 47s |
Commit |
50ccad590a ℹ️: Merge 5d6126f26f6d1403078b5429c6b82638884adec6 into 0030db7816dfbdbadf22843f6cf8...
|
Committer | Adrian Galvan |
View all properties for this run ↗︎ |
Test results | |
---|---|
Failures |
0
|
Flaky |
0
|
Pending |
0
|
Skipped |
0
|
Passing |
4
|
Upgrade your plan to view test results. | |
View all changes introduced in this branch ↗︎ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks clean, I'll leave my approve, I'm waiting for the fidesplus PR to come and test it by myself.
Some tests are still failing: https://github.com/ethyca/fides/actions/runs/12304095531/job/34341403381?pr=5601#step:8:926 |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5601 +/- ##
=======================================
Coverage 87.11% 87.11%
=======================================
Files 388 388
Lines 23906 23902 -4
Branches 2585 2583 -2
=======================================
- Hits 20826 20823 -3
Misses 2522 2522
+ Partials 558 557 -1 ☔ View full report in Codecov by Sentry. |
Closes LA-207
Description Of Changes
Removes the
dataset
andundeclared_data_categories
properties from theSystem
andPrivacyDeclaration
models. The new approach loads the data categories up-front using Postgres JSON operators. This approach is considerably faster since it avoids loading large datasets. In addition to improving the performance of the data map report, any API calls that return Systems or PrivacyDeclarations will also benefit from the removal of thedataset
property.Code Changes
dataset
property fromSystem
andPrivacyDeclaration
undeclared_data_categories
to use the pre-loaded data categories mapSteps to Confirm
Pre-Merge Checklist
CHANGELOG.md
updatedmain
downgrade()
migration is correct and works