Skip to content

Commit

Permalink
Added masking_strategy_override at field level (#5446)
Browse files Browse the repository at this point in the history
Co-authored-by: Adrian Galvan <[email protected]>
  • Loading branch information
Linker44 and galvana authored Nov 18, 2024
1 parent 523c1ab commit 861201b
Show file tree
Hide file tree
Showing 23 changed files with 993 additions and 112 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ The types of changes are:

### Added
- Added namespace support for Snowflake [#5486](https://github.com/ethyca/fides/pull/5486)
- Added support for field-level masking overrides [#5446](https://github.com/ethyca/fides/pull/5446)

### Developer Experience
- Migrated several instances of Chakra's Select component to use Ant's Select component [#5475](https://github.com/ethyca/fides/pull/5475)
Expand Down
21 changes: 14 additions & 7 deletions data/dataset/bigquery_example_test_dataset.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ dataset:
collections:
- name: address
fides_meta:
erase_after: [ bigquery_example_test_dataset.employee ]
erase_after: [bigquery_example_test_dataset.employee]
fields:
- name: city
data_categories: [user.contact.address.city]
Expand All @@ -19,12 +19,18 @@ dataset:
data_categories: [user.contact.address.state]
- name: street
data_categories: [user.contact.address.street]
fides_meta:
data_type: string
masking_strategy_override:
strategy: string_rewrite
configuration:
rewrite_value: REDACTED
- name: zip
data_categories: [user.contact.address.postal_code]

- name: customer
fides_meta:
erase_after: [ bigquery_example_test_dataset.address ]
erase_after: [bigquery_example_test_dataset.address]
fields:
- name: address_id
data_categories: [system.operations]
Expand Down Expand Up @@ -238,11 +244,12 @@ dataset:
- name: visit_partitioned
fides_meta:
partitioning:
where_clauses: [
"`last_visit` > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 500 DAY) AND `last_visit` <= CURRENT_TIMESTAMP()",
"`last_visit` > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1000 DAY) AND `last_visit` <= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 500 DAY)",
"`last_visit` <= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1000 DAY)",
]
where_clauses:
[
"`last_visit` > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 500 DAY) AND `last_visit` <= CURRENT_TIMESTAMP()",
"`last_visit` > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1000 DAY) AND `last_visit` <= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 500 DAY)",
"`last_visit` <= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1000 DAY)",
]
fields:
- name: email
data_categories: [user.contact.email]
Expand Down
252 changes: 252 additions & 0 deletions data/dataset/example_field_masking_override_test_dataset.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
dataset:
- fides_key: field_masking_override_test_dataset
name: Field Masking Override Test Dataset
description: Example of a dataset containing masking strategy override at the field-level.
collections:
- name: address
fields:
- name: city
data_categories: [user.contact.address.city]
- name: house
data_categories: [user.contact.address.street]
- name: id
data_categories: [system.operations]
fides_meta:
primary_key: True
- name: state
data_categories: [user.contact.address.state]
- name: street
data_categories: [user.contact.address.street]
- name: zip
data_categories: [user.contact.address.postal_code]

- name: customer
fields:
- name: address_id
data_categories: [system.operations]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: address.id
direction: to
- name: created
data_categories: [system.operations]
- name: email
data_categories: [user.contact.email]
fides_meta:
identity: email
data_type: string
- name: id
data_categories: [user.unique_id]
fides_meta:
primary_key: True
- name: name
data_categories: [user.name]
fides_meta:
data_type: string
length: 40
masking_strategy_override:
strategy: random_string_rewrite
configuration:
length: 5
format_preservation:
suffix: "@example.com"
- name: address
fields:
- name: city
data_categories: [user.contact.address.city]
- name: house
data_categories: [user.contact.address.street]
fides_meta:
data_type: string
masking_strategy_override:
strategy: string_rewrite
configuration:
rewrite_value: "1234"
format_preservation:
suffix: "-test"
- name: state
data_categories: [user.contact.address.state]
masking_strategy_override:
strategy: null_rewrite
- name: street
data_categories: [user.contact.address.street]
- name: zip
data_categories: [user.contact.address.postal_code]

- name: employee
fields:
- name: address_id
data_categories: [system.operations]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: address.id
direction: to
- name: email
data_categories: [user.contact.email]
fides_meta:
identity: email
data_type: string
- name: id
data_categories: [user.unique_id]
fides_meta:
primary_key: True
- name: name
data_categories: [user.name]
fides_meta:
data_type: string

- name: login
fields:
- name: customer_id
data_categories: [user.unique_id]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: customer.id
direction: from
- name: id
data_categories: [system.operations]
fides_meta:
primary_key: True
- name: time
data_categories: [user.sensor]

- name: orders
fields:
- name: customer_id
data_categories: [user.unique_id]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: customer.id
direction: from
- name: id
data_categories: [system.operations]
fides_meta:
primary_key: True
- name: shipping_address_id
data_categories: [system.operations]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: address.id
direction: to

# order_item
- name: order_item
fields:
- name: order_id
data_categories: [system.operations]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: orders.id
direction: from
- name: product_id
data_categories: [system.operations]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: product.id
direction: to
- name: quantity
data_categories: [system.operations]

- name: payment_card
fields:
- name: billing_address_id
data_categories: [system.operations]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: address.id
direction: to
- name: ccn
data_categories: [user.financial.bank_account]
- name: code
data_categories: [user.financial]
- name: customer_id
data_categories: [user.unique_id]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: customer.id
direction: from
- name: id
data_categories: [system.operations]
fides_meta:
primary_key: True
- name: name
data_categories: [user.financial]
- name: preferred
data_categories: [user]

- name: product
fields:
- name: id
data_categories: [system.operations]
fides_meta:
primary_key: True
- name: name
data_categories: [system.operations]
- name: price
data_categories: [system.operations]

- name: report
fields:
- name: email
data_categories: [user.contact.email]
fides_meta:
identity: email
data_type: string
- name: id
data_categories: [system.operations]
fides_meta:
primary_key: True
- name: month
data_categories: [system.operations]
- name: name
data_categories: [system.operations]
- name: total_visits
data_categories: [system.operations]
- name: year
data_categories: [system.operations]

- name: service_request
fields:
- name: alt_email
data_categories: [user.contact.email]
fides_meta:
identity: email
data_type: string
- name: closed
data_categories: [system.operations]
- name: email
data_categories: [system.operations]
fides_meta:
identity: email
data_type: string
- name: employee_id
data_categories: [user.unique_id]
fides_meta:
references:
- dataset: field_masking_override_test_dataset
field: employee.id
direction: from
- name: id
data_categories: [system.operations]
fides_meta:
primary_key: True
- name: opened
data_categories: [system.operations]
- name: visit
fields:
- name: email
data_categories: [user.contact.email]
fides_meta:
identity: email
data_type: string
- name: last_visit
data_categories: [system.operations]
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
dataset:
- fides_key: postgres_example_invalid_masking_strategy_override
name: Postgres Example Invalid Masking Strategy Override Test Dataset
description: Example of a Postgres dataset containing an invalid masking startegy override
collections:
- name: customer
fields:
- name: created
data_categories: [system.operations]
- name: email
data_categories: [user.contact.email]
fides_meta:
identity: email
data_type: string
- name: id
data_categories: [user.unique_id]
fides_meta:
primary_key: True
- name: name
data_categories: [user.name]
fides_meta:
data_type: string
length: 40

- name: employee
fields:
- name: email
data_categories: [user.contact.email]
fides_meta:
identity: email
data_type: string
- name: id
data_categories: [user.unique_id]
fides_meta:
primary_key: True
- name: name
data_categories: [user.name]
fides_meta:
data_type: string
masking_strategy_override:
strategy: hash
configuration:
algorithm: SHA-256
6 changes: 6 additions & 0 deletions data/dataset/postgres_example_test_dataset.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,12 @@ dataset:
data_categories: [user.name]
fides_meta:
data_type: string
masking_strategy_override:
strategy: string_rewrite
configuration:
rewrite_value: testing
format_preservation:
suffix: "-test"

- name: login
fields:
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,4 @@ twilio==7.15.0
typing-extensions==4.12.2
validators==0.20.0
versioneer==0.19
fideslang==3.0.8
fideslang==3.0.9
2 changes: 2 additions & 0 deletions src/fides/api/api/v1/endpoints/dataset_endpoints.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
DatasetConfig,
convert_dataset_to_graph,
to_graph_field,
validate_masking_strategy_override,
)
from fides.api.oauth.utils import verify_oauth_client
from fides.api.schemas.api import BulkUpdateFailed
Expand Down Expand Up @@ -417,6 +418,7 @@ def create_or_update_dataset(
# when a ctl_dataset is being linked to a Saas Connector.
_validate_saas_dataset(connection_config, dataset) # type: ignore
# Try to find an existing DatasetConfig matching the given connection & key
validate_masking_strategy_override(dataset)
dataset_config = create_method(db, data=data)
created_or_updated.append(dataset_config.ctl_dataset)
except (
Expand Down
Loading

0 comments on commit 861201b

Please sign in to comment.