Make HMA likelihood match respect cardinality #1864

fealho · 2024-03-22T03:47:36Z

CU-86azkhebm, Resolve #1834.

sdv-team · 2024-04-02T17:02:19Z

Task linked: CU-86azkhebm SDV - HMA likelihood match should respect cardinality #1834

codecov-commenter · 2024-04-03T18:35:39Z

Codecov Report

Attention: Patch coverage is 95.12195% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 97.44%. Comparing base (df7da68) to head (cf5154e).
Report is 4 commits behind head on main.

❗ Current head cf5154e differs from pull request most recent head f60d850. Consider uploading reports for the commit f60d850 to get more accurate results

Files	Patch %	Lines
sdv/sampling/hierarchical_sampler.py	92.85%	2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1864      +/-   ##
==========================================
- Coverage   97.51%   97.44%   -0.08%     
==========================================
  Files          51       51              
  Lines        5119     5011     -108     
==========================================
- Hits         4992     4883     -109     
- Misses        127      128       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

R-Palazzo · 2024-04-05T13:30:51Z

sdv/sampling/hierarchical_sampler.py

@@ -123,7 +117,33 @@ def _add_child_rows(self, child_name, parent_name, parent_row, sampled_data, num
                sampled_data[child_name] = pd.concat(
                    [previous, sampled_rows]).reset_index(drop=True)

-    def _sample_children(self, table_name, sampled_data):
+    def _enforce_table_sizes(self, child_name, table_name, scale, sampled_data):


You could add a docstring to explain the logic here.

frances-h · 2024-04-05T18:01:01Z

sdv/sampling/hierarchical_sampler.py

+                    for i in num_rows_column[::-1]:
+                        if sampled_data[table_name][num_rows_key][i] <= min_rows:
+                            break
+                        sampled_data[table_name][num_rows_key][i] -= 1


This line seems to be emitting a warning:

[/Users/frances/Documents/SDV-clean/sdv/sampling/hierarchical_sampler.py:142]: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0! You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy. A typical example is when you are setting values in a column of a DataFrame, like: df["col"][row_indexer] = value Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`. See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy sampled_data[table_name][num_rows_key][i] -= 1

frances-h · 2024-04-05T18:04:44Z

sdv/sampling/hierarchical_sampler.py

+        total_num_rows = int(self._table_sizes[child_name] * scale)
+        for foreign_key in self.metadata._get_foreign_keys(table_name, child_name):
+            num_rows_key = f'__{child_name}__{foreign_key}__num_rows'
+            min_rows = self._min_child_rows[num_rows_key]


We'll need to change this to getattr and provide a sensible default or alternate codepath for backwards compatibility.

fealho changed the title ~~Draft~~ Make HMA likelihood match respect cardinality Apr 2, 2024

fealho force-pushed the issue-1834-hma-likelihood branch from 9662bef to cf5154e Compare April 3, 2024 16:48

fealho marked this pull request as ready for review April 3, 2024 16:56

fealho requested a review from a team as a code owner April 3, 2024 16:56

fealho requested review from frances-h and R-Palazzo and removed request for a team April 3, 2024 16:56

R-Palazzo approved these changes Apr 5, 2024

View reviewed changes

frances-h requested changes Apr 5, 2024

View reviewed changes

fealho force-pushed the issue-1834-hma-likelihood branch from cf5154e to 66bedbb Compare April 8, 2024 15:58

fealho requested a review from frances-h April 8, 2024 16:17

fealho force-pushed the issue-1834-hma-likelihood branch 2 times, most recently from 8b79999 to bfa2145 Compare April 9, 2024 19:04

frances-h approved these changes Apr 10, 2024

View reviewed changes

fealho added 7 commits April 11, 2024 09:59

Draft

4b90c27

Remove some artifacts

ef70a24

Remove artifacts

317a43c

Working version

1e2581e

Fix bug

dab0542

Address feedback + fix table sizes

0147fe9

Code cleanup

f60d850

fealho force-pushed the issue-1834-hma-likelihood branch from bfa2145 to f60d850 Compare April 11, 2024 15:03

fealho merged commit 4691fab into main Apr 12, 2024
37 checks passed

fealho deleted the issue-1834-hma-likelihood branch April 12, 2024 02:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make HMA likelihood match respect cardinality #1864

Make HMA likelihood match respect cardinality #1864

fealho commented Mar 22, 2024 •

edited

Loading

sdv-team commented Apr 2, 2024

codecov-commenter commented Apr 3, 2024 •

edited

Loading

R-Palazzo Apr 5, 2024

frances-h Apr 5, 2024

frances-h Apr 5, 2024

Make HMA likelihood match respect cardinality #1864

Make HMA likelihood match respect cardinality #1864

Conversation

fealho commented Mar 22, 2024 • edited Loading

sdv-team commented Apr 2, 2024

codecov-commenter commented Apr 3, 2024 • edited Loading

Codecov Report

R-Palazzo Apr 5, 2024

Choose a reason for hiding this comment

frances-h Apr 5, 2024

Choose a reason for hiding this comment

frances-h Apr 5, 2024

Choose a reason for hiding this comment

fealho commented Mar 22, 2024 •

edited

Loading

codecov-commenter commented Apr 3, 2024 •

edited

Loading