-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for disconnected tables #1979
Conversation
Now that datacebo/sdv-enterprise#585, datacebo/sdv-enterprise#578, and datacebo/sdv-enterprise#579 are in can we re-run the experiments to make sure everything works as expected? @lajohn4747 Assuming everything works and no other issues arise, are we good to mark this as ready @npatki? |
Validated it works for demo datasets for all our multi-synthesizers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if you run the tests on enterprise (including the benchmarking) with these updates everything passes?
Yep all the tests and benchmark tests pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an integration test?
try: | ||
self._validate_all_tables_connected(self._get_parent_map(), self._get_child_map()) | ||
except InvalidMetadataError as invalid_error: | ||
warning_msg = ( | ||
f'Could not automatically add relationships for all tables. {str(invalid_error)}' | ||
) | ||
warnings.warn(warning_msg) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to get rid of this warning? While it's not a requirement anymore, it might be nice to still warn that the metadata detection wasn't able to connect all of the tables. @npatki, thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be an interesting one, it seems that checking all tables are connected is the way we confirm we got all the relationships but now this wouldn't be the case anymore as disjointed tables exist. Not sure how we then go about confirming all relationships are captured. @frances-h
Added one for HMA. I also plan to add more tests in our other synthesizers as well as well: https://github.com/datacebo/SDV-Enterprise/pull/621 |
Just wanted to note that HMA benchmark leaves out a lot of datasets because they take a very long time for the larger datasets and diagnostic reports do not go over HMA at all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
resolves #2047
CU-86b092r53
Issue
Allow for disconnected schemas by removing the check.
Metadata visualization still works
All other synthesizers work fine with no relationships defined in metadata (HMA and SDV Enterprise multi-table synthesizers)
Evaluations and quality report still work (Eye test scores look fine, cardinality and intertable trends drop in score)
Does not require any other fixes for running multi-table synthesizer