Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for disconnected tables #1979

Merged
merged 10 commits into from
Jun 11, 2024
Merged

Allow for disconnected tables #1979

merged 10 commits into from
Jun 11, 2024

Conversation

lajohn4747
Copy link
Contributor

@lajohn4747 lajohn4747 commented May 2, 2024

resolves #2047
CU-86b092r53
Issue

Allow for disconnected schemas by removing the check.

Metadata visualization still works
All other synthesizers work fine with no relationships defined in metadata (HMA and SDV Enterprise multi-table synthesizers)
Evaluations and quality report still work (Eye test scores look fine, cardinality and intertable trends drop in score)

Does not require any other fixes for running multi-table synthesizer

@lajohn4747 lajohn4747 requested a review from a team as a code owner May 2, 2024 15:49
@sdv-team
Copy link
Contributor

sdv-team commented May 2, 2024

@lajohn4747 lajohn4747 marked this pull request as draft May 2, 2024 15:54
@lajohn4747 lajohn4747 marked this pull request as ready for review May 2, 2024 16:22
@lajohn4747 lajohn4747 marked this pull request as draft May 8, 2024 16:55
@frances-h
Copy link
Contributor

Now that datacebo/sdv-enterprise#585, datacebo/sdv-enterprise#578, and datacebo/sdv-enterprise#579 are in can we re-run the experiments to make sure everything works as expected? @lajohn4747

Assuming everything works and no other issues arise, are we good to mark this as ready @npatki?

@lajohn4747 lajohn4747 marked this pull request as ready for review June 5, 2024 19:29
@lajohn4747
Copy link
Contributor Author

Now that https://github.com/datacebo/SDV-Enterprise/issues/585, https://github.com/datacebo/SDV-Enterprise/issues/578, and https://github.com/datacebo/SDV-Enterprise/issues/579 are in can we re-run the experiments to make sure everything works as expected? @lajohn4747

Validated it works for demo datasets for all our multi-synthesizers.

@lajohn4747 lajohn4747 requested review from amontanez24 and removed request for a team and pvk-developer June 5, 2024 19:30
Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if you run the tests on enterprise (including the benchmarking) with these updates everything passes?

@lajohn4747
Copy link
Contributor Author

So if you run the tests on enterprise (including the benchmarking) with these updates everything passes?

Yep all the tests and benchmark tests pass

@lajohn4747 lajohn4747 requested a review from amontanez24 June 7, 2024 15:18
Copy link
Contributor

@frances-h frances-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an integration test?

Comment on lines -516 to -523
try:
self._validate_all_tables_connected(self._get_parent_map(), self._get_child_map())
except InvalidMetadataError as invalid_error:
warning_msg = (
f'Could not automatically add relationships for all tables. {str(invalid_error)}'
)
warnings.warn(warning_msg)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to get rid of this warning? While it's not a requirement anymore, it might be nice to still warn that the metadata detection wasn't able to connect all of the tables. @npatki, thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be an interesting one, it seems that checking all tables are connected is the way we confirm we got all the relationships but now this wouldn't be the case anymore as disjointed tables exist. Not sure how we then go about confirming all relationships are captured. @frances-h

tests/unit/metadata/test_multi_table.py Show resolved Hide resolved
@lajohn4747
Copy link
Contributor Author

Can we add an integration test?

Added one for HMA. I also plan to add more tests in our other synthesizers as well as well: https://github.com/datacebo/SDV-Enterprise/pull/621

@lajohn4747 lajohn4747 requested a review from frances-h June 7, 2024 17:00
@lajohn4747
Copy link
Contributor Author

So if you run the tests on enterprise (including the benchmarking) with these updates everything passes?

Yep all the tests and benchmark tests pass

Just wanted to note that HMA benchmark leaves out a lot of datasets because they take a very long time for the larger datasets and diagnostic reports do not go over HMA at all.

@amontanez24 @npatki

Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@lajohn4747 lajohn4747 merged commit 1286211 into main Jun 11, 2024
39 checks passed
@lajohn4747 lajohn4747 deleted the issue_549_disconnected_schema branch June 11, 2024 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable the ability to run multi table synthesizers on disjointed table schemas
4 participants