-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make HMA likelihood match respect cardinality #1864
Conversation
9662bef
to
cf5154e
Compare
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #1864 +/- ##
==========================================
- Coverage 97.51% 97.44% -0.08%
==========================================
Files 51 51
Lines 5119 5011 -108
==========================================
- Hits 4992 4883 -109
- Misses 127 128 +1 ☔ View full report in Codecov by Sentry. |
sdv/sampling/hierarchical_sampler.py
Outdated
@@ -123,7 +117,33 @@ def _add_child_rows(self, child_name, parent_name, parent_row, sampled_data, num | |||
sampled_data[child_name] = pd.concat( | |||
[previous, sampled_rows]).reset_index(drop=True) | |||
|
|||
def _sample_children(self, table_name, sampled_data): | |||
def _enforce_table_sizes(self, child_name, table_name, scale, sampled_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add a docstring to explain the logic here.
sdv/sampling/hierarchical_sampler.py
Outdated
for i in num_rows_column[::-1]: | ||
if sampled_data[table_name][num_rows_key][i] <= min_rows: | ||
break | ||
sampled_data[table_name][num_rows_key][i] -= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line seems to be emitting a warning:
[/Users/frances/Documents/SDV-clean/sdv/sampling/hierarchical_sampler.py:142]: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:
df["col"][row_indexer] = value
Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
sampled_data[table_name][num_rows_key][i] -= 1
sdv/sampling/hierarchical_sampler.py
Outdated
total_num_rows = int(self._table_sizes[child_name] * scale) | ||
for foreign_key in self.metadata._get_foreign_keys(table_name, child_name): | ||
num_rows_key = f'__{child_name}__{foreign_key}__num_rows' | ||
min_rows = self._min_child_rows[num_rows_key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to change this to getattr
and provide a sensible default or alternate codepath for backwards compatibility.
cf5154e
to
66bedbb
Compare
8b79999
to
bfa2145
Compare
bfa2145
to
f60d850
Compare
CU-86azkhebm, Resolve #1834.