Validate a standalone dataset #5549

galvana · 2024-11-30T22:56:39Z

Closes LA-157

Description Of Changes

Adds three new API endpoints to assist with the testing of individual datasets

GET /connection/{connection_key}/dataset/{datset_key}/inputs
- Returns the immediate inputs to be able to run the given dataset
GET /connection/{connection_key}/dataset/{datset_key}/reachability
- Returns the reachability status of the dataset (true or false) along with error details if the dataset isn't reachable
POST /connection/{connection_key}/dataset/{datset_key}/test
- Creates a standalone privacy request for the given dataset using DSR 3.0. These privacy requests are created with a source of Dataset test and are omitted from the Request manager page
GET /privacy_request/{privacy_request_id}/filtered-results
- Returns the privacy_request_id, status, and results for the privacy request. This endpoint will only return results for test privacy requests (requests with a source of Dataset test)

Also includes a test dataset page that uses the new dataset test endpoints

Code Changes

New endpoints as mentioned above
New test datasets page
- DatasetEditorSection.tsx and TestRunnerSection.tsx components using the dataset-test.slice.ts

Steps to Confirm

Start Fidesplus with nox -s demo -- dev so we have access to the "Cookie House PostgreSQL Database" system
Once Fides starts up navigate to Data inventory > System inteventory and go to the Integrations tab of the "Cookie House PostgreSQL Database" system
Click on the Test datasets button
The test page should show you the required inputs for the dataset, input known good values ({"email": "[email protected]"}) and hit Run
The results should appear in the bottom-right Test results section

Pre-Merge Checklist

vercel · 2024-11-30T22:56:44Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
fides-plus-nightly	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Dec 4, 2024 5:30pm

cypress · 2024-11-30T23:16:46Z

fides Run #11262

Run Properties: Passed #11262 • 76272a0ece ℹ️: Merge b25bf2f45a41da1c44a0f03d21fdfb836c7f7c93 into a28ae2f9de42c2e93246c1201be9...

Project	`fides`
Branch Review	`refs/pull/5549/merge`
Run status	`Passed #11262`
Run duration	`00m 51s`
Commit	`76272a0ece ℹ️: Merge b25bf2f45a41da1c44a0f03d21fdfb836c7f7c93 into a28ae2f9de42c2e93246c1201be9...`
Committer	`Adrian Galvan`
View all properties for this run ↗︎

Test results
Failures	`0`
Flaky	`0`
Pending	`0`
Skipped	`0`
Passing	`4`
View all changes introduced in this branch ↗︎

src/fides/api/service/dataset/dataset_service.py

eastandwestwind

Really nice work on this @galvana !

I'm impressed with the detail you put into the BE tests, as well as the little features you added along the way that really add to the "enterprise" user experience.

As we discussed in our pair CR session, in addition to the comments I left, I'd like to see:

In the UI, add some tooltips for a) note that results are raw results, not based on policy, and b) note that Run will run a test access request using the given identity data
Write docs, or defer to separate PR
New ticket to write FE tests
Put the ability to test datasets / privacy requests behind a plus flag for now, but let's add a new ticket to move these new endpoints to plus
In the YAML editor, if there is a generic error in yaml format, let's try to surface a user-friendly error

clients/admin-ui/src/features/test-datasets/TestRunnerSection.tsx

src/fides/api/api/v1/endpoints/privacy_request_endpoints.py

eastandwestwind · 2024-12-03T18:14:02Z

src/fides/api/service/dataset/dataset_service.py

+    graph_dataset = dataset_config.get_graph()
+    for collection in graph_dataset.collections:
+        for field in collection.fields:
+            for ref, edge_direction in field.references[:]:
+                if edge_direction == "from" and ref.dataset != dataset_config.fides_key:
+                    field.identity = f"{ref.dataset}_{ref.collection}_{'_'.join(ref.field_path.levels)}"
+                    field.references.remove((ref, "from"))


can we extract this into a separate function? Since we're mutating the graph_dataset var, we can just have the function return that

Like this?

def _replace_references_with_identities(dataset_key: str, graph_dataset: GraphDataset): """ Replace external field references with identity values for testing. Creates a copy of the graph dataset and replaces dataset references with equivalent identity references that can be seeded directly. This allows testing a single dataset in isolation without needing to load data from referenced external datasets. """ modified_graph_dataset = deepcopy(graph_dataset) for collection in modified_graph_dataset.collections: for field in collection.fields: for ref, edge_direction in field.references[:]: if edge_direction == "from" and ref.dataset != dataset_key: field.identity = f"{ref.dataset}_{ref.collection}_{'_'.join(ref.field_path.levels)}" field.references.remove((ref, "from")) return modified_graph_dataset

galvana · 2024-12-03T18:54:10Z

src/fides/api/service/dataset/dataset_service.py

+    graph_dataset = dataset_config.get_graph()
+    for collection in graph_dataset.collections:
+        for field in collection.fields:
+            for ref, edge_direction in field.references[:]:
+                if edge_direction == "from" and ref.dataset != dataset_config.fides_key:
+                    field.identity = f"{ref.dataset}_{ref.collection}_{'_'.join(ref.field_path.levels)}"
+                    field.references.remove((ref, "from"))


Like this?

def _replace_references_with_identities(dataset_key: str, graph_dataset: GraphDataset): """ Replace external field references with identity values for testing. Creates a copy of the graph dataset and replaces dataset references with equivalent identity references that can be seeded directly. This allows testing a single dataset in isolation without needing to load data from referenced external datasets. """ modified_graph_dataset = deepcopy(graph_dataset) for collection in modified_graph_dataset.collections: for field in collection.fields: for ref, edge_direction in field.references[:]: if edge_direction == "from" and ref.dataset != dataset_key: field.identity = f"{ref.dataset}_{ref.collection}_{'_'.join(ref.field_path.levels)}" field.references.remove((ref, "from")) return modified_graph_dataset

clients/admin-ui/src/features/test-datasets/TestRunnerSection.tsx

galvana · 2024-12-03T21:51:47Z

clients/admin-ui/src/features/test-datasets/DatasetEditorSection.tsx

-      }
-    } catch (error) {
-      toast(errorToastParams(getErrorMessage(error as FetchBaseQueryError)));
+      datasetValues = yaml.load(editorContent) as Dataset;


I put this in a try/catch so that we can catch any parsing exceptions and show a toast

galvana · 2024-12-03T21:53:39Z

clients/admin-ui/src/features/test-datasets/DatasetEditorSection.tsx

-        const updatedDatasetConfig: DatasetConfigSchema = {
-          fides_key: currentDataset.fides_key,
-          ctl_dataset: {
-            ...currentDataset.ctl_dataset,
-            ...datasetValues,
-          },
-        };


This was causing some weird consolidation issues, it was better to just pass the datasetValues directly to the updateDataset

galvana · 2024-12-03T21:54:01Z

clients/admin-ui/src/features/test-datasets/DatasetEditorSection.tsx

@@ -169,7 +184,7 @@ const EditorSection = ({ connectionKey }: EditorSectionProps) => {
          <Button
            htmlType="submit"
            size="small"
-            data-testid="save-btn"
+            data-testid="refresh-btn"


Adding test IDs for later

clients/admin-ui/src/features/test-datasets/TestRunnerSection.tsx

galvana · 2024-12-03T21:54:33Z

clients/admin-ui/src/features/test-datasets/TestRunnerSection.tsx

-          Run
-        </Button>
+        <HStack>
+          <QuestionTooltip label="Run a test access request using the provided test input data" />


Added a tooltip for an extra bit of context

galvana · 2024-12-03T21:55:28Z

src/fides/api/service/dataset/dataset_service.py

@@ -153,3 +151,25 @@ def run_test_access_request(
        privacy_request_proceed=False,
    )
    return privacy_request
+
+
+def _replace_references_with_identities(dataset_key: str, graph_dataset: GraphDataset):


Broke this out to it's own function, and made it non-destructive

I like this a lot better, thanks!

eastandwestwind · 2024-12-04T15:11:42Z

src/fides/api/api/v1/endpoints/privacy_request_endpoints.py

@@ -2632,7 +2632,7 @@ def get_access_results_urls(
    status_code=HTTP_200_OK,
    response_model=FilteredPrivacyRequestResults,
 )
-def get_filtered_results(
+def get_test_privacy_request_results(


eastandwestwind · 2024-12-04T15:12:16Z

src/fides/api/service/dataset/dataset_service.py

@@ -153,3 +151,25 @@ def run_test_access_request(
        privacy_request_proceed=False,
    )
    return privacy_request
+
+
+def _replace_references_with_identities(dataset_key: str, graph_dataset: GraphDataset):


I like this a lot better, thanks!

cypress · 2024-12-04T17:45:27Z

fides Run #11263

Run Properties: Passed #11263 • 966a0e3279: Validate a standalone dataset (#5549)

Project	`fides`
Branch Review	`main`
Run status	`Passed #11263`
Run duration	`00m 46s`
Commit	`966a0e3279: Validate a standalone dataset (#5549)`
Committer	`Adrian Galvan`
View all properties for this run ↗︎

Test results
Failures	`0`
Flaky	`0`
Pending	`0`
Skipped	`0`
Passing	`4`
View all changes introduced in this branch ↗︎

galvana added 12 commits November 24, 2024 22:44

Adding hashes to system tab URLs

Loading
Loading status checks…

5bcff41

Removing unnecessary useCallback

Loading
Loading status checks…

e07947b

Async test

Loading
Loading status checks…

8128b7c

Adding Cypress tests

Loading
Loading status checks…

767194b

Dataset testing

Loading
Loading status checks…

31b98fb

Saving current progress

Loading
Loading status checks…

18878cd

Service updates + tests

70a2399

Adding new privacy request filtered results endpoint

4e40716

Saving test-datasets.tsx before refactor

713feac

Minor refactor of dataset test page

6f94867

Fixing bug where test input values were sometimes nulled out

af2969a

Omitting null dataset values from editor

Loading
Loading status checks…

c732713

Merge branch 'main' into LA-157-validate-a-standalone-dataset

Loading
Loading status checks…

c1060b5

vercel bot deployed to Preview November 30, 2024 23:05 View deployment

Adding tests

Loading
Loading status checks…

ef2db25

vercel bot had a problem deploying to Preview December 1, 2024 04:50 Failure

Adding more endpoint tests

Loading
Loading status checks…

d133de0

vercel bot had a problem deploying to Preview December 1, 2024 06:34 Failure

galvana added 2 commits November 30, 2024 22:37

Removing outdated comment

Loading
Loading status checks…

fd7c444

Updating dataset test components to use Redux

Loading
Loading status checks…

f85139c

vercel bot deployed to Preview December 2, 2024 22:43 View deployment

galvana added 3 commits December 2, 2024 16:06

Fixing tests

Loading
Loading status checks…

c1c81c9

Adding external_integration mark to test

Loading
Loading status checks…

922d323

Merge branch 'main' into LA-157-validate-a-standalone-dataset

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode

Loading
Loading status checks…

36dd021

vercel bot deployed to Preview December 3, 2024 00:51 View deployment

galvana marked this pull request as ready for review December 3, 2024 01:49

galvana commented Dec 3, 2024

View reviewed changes

src/fides/api/service/dataset/dataset_service.py Show resolved Hide resolved

galvana commented Dec 3, 2024

View reviewed changes

src/fides/api/service/dataset/dataset_service.py Show resolved Hide resolved

galvana requested a review from eastandwestwind December 3, 2024 18:15

eastandwestwind requested changes Dec 3, 2024

View reviewed changes

Changes based on PR feedback

Loading
Loading status checks…

c29cfac

galvana requested a review from eastandwestwind December 3, 2024 21:50

vercel bot deployed to Preview December 3, 2024 21:54 View deployment

galvana commented Dec 3, 2024

View reviewed changes

galvana added 2 commits December 3, 2024 13:55

Adding missing return type

Loading
Loading status checks…

b23779c

Putting test datasets behind plus flag

Loading
Loading status checks…

91534c4

vercel bot deployed to Preview December 3, 2024 22:46 View deployment

eastandwestwind approved these changes Dec 4, 2024

View reviewed changes

galvana added 2 commits December 4, 2024 09:26

Updating change log

Loading
Loading status checks…

c115b6b

Merge branch 'main' into LA-157-validate-a-standalone-dataset

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode

Loading
Loading status checks…

b25bf2f

galvana merged commit 966a0e3 into main Dec 4, 2024
19 of 20 checks passed

galvana deleted the LA-157-validate-a-standalone-dataset branch December 4, 2024 17:29

vercel bot deployed to Preview December 4, 2024 17:30 View deployment

andres-torres-marroquin pushed a commit that referenced this pull request Dec 11, 2024

Validate a standalone dataset (#5549)

7ae55f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate a standalone dataset #5549

Validate a standalone dataset #5549

galvana commented Nov 30, 2024 •

edited

Loading

vercel bot commented Nov 30, 2024 •

edited

Loading

cypress bot commented Nov 30, 2024 •

edited

Loading

eastandwestwind left a comment

eastandwestwind Dec 3, 2024

galvana Dec 3, 2024

galvana Dec 3, 2024

galvana Dec 3, 2024

galvana Dec 3, 2024

galvana Dec 3, 2024

galvana Dec 3, 2024

galvana Dec 3, 2024

eastandwestwind Dec 4, 2024

eastandwestwind Dec 4, 2024

eastandwestwind Dec 4, 2024

cypress bot commented Dec 4, 2024

Validate a standalone dataset #5549

Validate a standalone dataset #5549

Conversation

galvana commented Nov 30, 2024 • edited Loading

Description Of Changes

Code Changes

Steps to Confirm

Pre-Merge Checklist

vercel bot commented Nov 30, 2024 • edited Loading

cypress bot commented Nov 30, 2024 • edited Loading

eastandwestwind left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cypress bot commented Dec 4, 2024

galvana commented Nov 30, 2024 •

edited

Loading

vercel bot commented Nov 30, 2024 •

edited

Loading

cypress bot commented Nov 30, 2024 •

edited

Loading