Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate a standalone dataset #5549

Merged
merged 25 commits into from
Dec 4, 2024
Merged

Conversation

galvana
Copy link
Contributor

@galvana galvana commented Nov 30, 2024

Closes LA-157

Description Of Changes

Adds three new API endpoints to assist with the testing of individual datasets

  • GET /connection/{connection_key}/dataset/{datset_key}/inputs
    • Returns the immediate inputs to be able to run the given dataset
  • GET /connection/{connection_key}/dataset/{datset_key}/reachability
    • Returns the reachability status of the dataset (true or false) along with error details if the dataset isn't reachable
  • POST /connection/{connection_key}/dataset/{datset_key}/test
    • Creates a standalone privacy request for the given dataset using DSR 3.0. These privacy requests are created with a source of Dataset test and are omitted from the Request manager page
  • GET /privacy_request/{privacy_request_id}/filtered-results
    • Returns the privacy_request_id, status, and results for the privacy request. This endpoint will only return results for test privacy requests (requests with a source of Dataset test)

Also includes a test dataset page that uses the new dataset test endpoints
dataset-test

Code Changes

  • New endpoints as mentioned above
  • New test datasets page
    • DatasetEditorSection.tsx and TestRunnerSection.tsx components using the dataset-test.slice.ts

Steps to Confirm

  1. Start Fidesplus with nox -s demo -- dev so we have access to the "Cookie House PostgreSQL Database" system
  2. Once Fides starts up navigate to Data inventory > System inteventory and go to the Integrations tab of the "Cookie House PostgreSQL Database" system
  3. Click on the Test datasets button
  4. The test page should show you the required inputs for the dataset, input known good values ({"email": "[email protected]"}) and hit Run
  5. The results should appear in the bottom-right Test results section

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
  • Followup issues:
    • Followup issues created (include link)
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

Sorry, something went wrong.

Copy link

vercel bot commented Nov 30, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
fides-plus-nightly ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 4, 2024 5:30pm

Copy link

cypress bot commented Nov 30, 2024

fides    Run #11262

Run Properties:  status check passed Passed #11262  •  git commit 76272a0ece ℹ️: Merge b25bf2f45a41da1c44a0f03d21fdfb836c7f7c93 into a28ae2f9de42c2e93246c1201be9...
Project fides
Branch Review refs/pull/5549/merge
Run status status check passed Passed #11262
Run duration 00m 51s
Commit git commit 76272a0ece ℹ️: Merge b25bf2f45a41da1c44a0f03d21fdfb836c7f7c93 into a28ae2f9de42c2e93246c1201be9...
Committer Adrian Galvan
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 4
View all changes introduced in this branch ↗︎

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@galvana galvana marked this pull request as ready for review December 3, 2024 01:49
Copy link
Contributor

@eastandwestwind eastandwestwind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work on this @galvana !

I'm impressed with the detail you put into the BE tests, as well as the little features you added along the way that really add to the "enterprise" user experience.

As we discussed in our pair CR session, in addition to the comments I left, I'd like to see:

  1. In the UI, add some tooltips for a) note that results are raw results, not based on policy, and b) note that Run will run a test access request using the given identity data
  2. Write docs, or defer to separate PR
  3. New ticket to write FE tests
  4. Put the ability to test datasets / privacy requests behind a plus flag for now, but let's add a new ticket to move these new endpoints to plus
  5. In the YAML editor, if there is a generic error in yaml format, let's try to surface a user-friendly error

Comment on lines 134 to 140
graph_dataset = dataset_config.get_graph()
for collection in graph_dataset.collections:
for field in collection.fields:
for ref, edge_direction in field.references[:]:
if edge_direction == "from" and ref.dataset != dataset_config.fides_key:
field.identity = f"{ref.dataset}_{ref.collection}_{'_'.join(ref.field_path.levels)}"
field.references.remove((ref, "from"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we extract this into a separate function? Since we're mutating the graph_dataset var, we can just have the function return that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this?

def _replace_references_with_identities(dataset_key: str, graph_dataset: GraphDataset):
    """
    Replace external field references with identity values for testing.

    Creates a copy of the graph dataset and replaces dataset references with
    equivalent identity references that can be seeded directly. This allows
    testing a single dataset in isolation without needing to load data from
    referenced external datasets.
    """

    modified_graph_dataset = deepcopy(graph_dataset)

    for collection in modified_graph_dataset.collections:
        for field in collection.fields:
            for ref, edge_direction in field.references[:]:
                if edge_direction == "from" and ref.dataset != dataset_key:
                    field.identity = f"{ref.dataset}_{ref.collection}_{'_'.join(ref.field_path.levels)}"
                    field.references.remove((ref, "from"))

    return modified_graph_dataset

Comment on lines 134 to 140
graph_dataset = dataset_config.get_graph()
for collection in graph_dataset.collections:
for field in collection.fields:
for ref, edge_direction in field.references[:]:
if edge_direction == "from" and ref.dataset != dataset_config.fides_key:
field.identity = f"{ref.dataset}_{ref.collection}_{'_'.join(ref.field_path.levels)}"
field.references.remove((ref, "from"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this?

def _replace_references_with_identities(dataset_key: str, graph_dataset: GraphDataset):
    """
    Replace external field references with identity values for testing.

    Creates a copy of the graph dataset and replaces dataset references with
    equivalent identity references that can be seeded directly. This allows
    testing a single dataset in isolation without needing to load data from
    referenced external datasets.
    """

    modified_graph_dataset = deepcopy(graph_dataset)

    for collection in modified_graph_dataset.collections:
        for field in collection.fields:
            for ref, edge_direction in field.references[:]:
                if edge_direction == "from" and ref.dataset != dataset_key:
                    field.identity = f"{ref.dataset}_{ref.collection}_{'_'.join(ref.field_path.levels)}"
                    field.references.remove((ref, "from"))

    return modified_graph_dataset

}
} catch (error) {
toast(errorToastParams(getErrorMessage(error as FetchBaseQueryError)));
datasetValues = yaml.load(editorContent) as Dataset;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this in a try/catch so that we can catch any parsing exceptions and show a toast

Comment on lines 111 to 117
const updatedDatasetConfig: DatasetConfigSchema = {
fides_key: currentDataset.fides_key,
ctl_dataset: {
...currentDataset.ctl_dataset,
...datasetValues,
},
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was causing some weird consolidation issues, it was better to just pass the datasetValues directly to the updateDataset

@@ -169,7 +184,7 @@ const EditorSection = ({ connectionKey }: EditorSectionProps) => {
<Button
htmlType="submit"
size="small"
data-testid="save-btn"
data-testid="refresh-btn"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding test IDs for later

Run
</Button>
<HStack>
<QuestionTooltip label="Run a test access request using the provided test input data" />
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a tooltip for an extra bit of context

@@ -153,3 +151,25 @@ def run_test_access_request(
privacy_request_proceed=False,
)
return privacy_request


def _replace_references_with_identities(dataset_key: str, graph_dataset: GraphDataset):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broke this out to it's own function, and made it non-destructive

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot better, thanks!

@@ -2632,7 +2632,7 @@ def get_access_results_urls(
status_code=HTTP_200_OK,
response_model=FilteredPrivacyRequestResults,
)
def get_filtered_results(
def get_test_privacy_request_results(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@@ -153,3 +151,25 @@ def run_test_access_request(
privacy_request_proceed=False,
)
return privacy_request


def _replace_references_with_identities(dataset_key: str, graph_dataset: GraphDataset):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot better, thanks!

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@galvana galvana merged commit 966a0e3 into main Dec 4, 2024
19 of 20 checks passed
@galvana galvana deleted the LA-157-validate-a-standalone-dataset branch December 4, 2024 17:29
Copy link

cypress bot commented Dec 4, 2024

fides    Run #11263

Run Properties:  status check passed Passed #11263  •  git commit 966a0e3279: Validate a standalone dataset (#5549)
Project fides
Branch Review main
Run status status check passed Passed #11263
Run duration 00m 46s
Commit git commit 966a0e3279: Validate a standalone dataset (#5549)
Committer Adrian Galvan
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 4
View all changes introduced in this branch ↗︎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants