-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always create a sample size of 1 no matter how small the scale. #2062
Conversation
sdv/sampling/hierarchical_sampler.py
Outdated
synthesizer = self._table_synthesizers[table] | ||
LOGGER.info(f'Sampling {num_rows} rows from table {table}') | ||
sampled_data[table] = self._sample_rows(synthesizer, num_rows) | ||
self._sample_children(table_name=table, sampled_data=sampled_data, scale=scale) | ||
|
||
if send_min_sample_warning: | ||
warn_msg = ( | ||
"The 'scale' parameter it too small. Some tables may have 1 row." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "The 'scale' parameter it too small." -> "The 'scale' parameter is too small."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you verify that this doesn't break the _enforce_table_size
method?
Added a unit test to show |
resolves #2045
CU-86b0reety
Make sure that the minimum size of root tables of 1 is used for sampling of small scales to avoid errors. Throw a warning that sample size should be bigger for better quality data as cardinality will likely no be accurate.