Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make HMA likelihood match respect cardinality #1864

Merged
merged 7 commits into from
Apr 12, 2024
Merged

Conversation

fealho
Copy link
Member

@fealho fealho commented Mar 22, 2024

CU-86azkhebm, Resolve #1834.

@sdv-team
Copy link
Contributor

sdv-team commented Apr 2, 2024

@fealho fealho changed the title Draft Make HMA likelihood match respect cardinality Apr 2, 2024
@fealho fealho force-pushed the issue-1834-hma-likelihood branch from 9662bef to cf5154e Compare April 3, 2024 16:48
@fealho fealho marked this pull request as ready for review April 3, 2024 16:56
@fealho fealho requested a review from a team as a code owner April 3, 2024 16:56
@fealho fealho requested review from frances-h and R-Palazzo and removed request for a team April 3, 2024 16:56
@codecov-commenter
Copy link

codecov-commenter commented Apr 3, 2024

Codecov Report

Attention: Patch coverage is 95.12195% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 97.44%. Comparing base (df7da68) to head (cf5154e).
Report is 4 commits behind head on main.

❗ Current head cf5154e differs from pull request most recent head f60d850. Consider uploading reports for the commit f60d850 to get more accurate results

Files Patch % Lines
sdv/sampling/hierarchical_sampler.py 92.85% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1864      +/-   ##
==========================================
- Coverage   97.51%   97.44%   -0.08%     
==========================================
  Files          51       51              
  Lines        5119     5011     -108     
==========================================
- Hits         4992     4883     -109     
- Misses        127      128       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -123,7 +117,33 @@ def _add_child_rows(self, child_name, parent_name, parent_row, sampled_data, num
sampled_data[child_name] = pd.concat(
[previous, sampled_rows]).reset_index(drop=True)

def _sample_children(self, table_name, sampled_data):
def _enforce_table_sizes(self, child_name, table_name, scale, sampled_data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add a docstring to explain the logic here.

for i in num_rows_column[::-1]:
if sampled_data[table_name][num_rows_key][i] <= min_rows:
break
sampled_data[table_name][num_rows_key][i] -= 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line seems to be emitting a warning:

[/Users/frances/Documents/SDV-clean/sdv/sampling/hierarchical_sampler.py:142]: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  sampled_data[table_name][num_rows_key][i] -= 1

total_num_rows = int(self._table_sizes[child_name] * scale)
for foreign_key in self.metadata._get_foreign_keys(table_name, child_name):
num_rows_key = f'__{child_name}__{foreign_key}__num_rows'
min_rows = self._min_child_rows[num_rows_key]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to change this to getattr and provide a sensible default or alternate codepath for backwards compatibility.

@fealho fealho force-pushed the issue-1834-hma-likelihood branch from cf5154e to 66bedbb Compare April 8, 2024 15:58
@fealho fealho requested a review from frances-h April 8, 2024 16:17
@fealho fealho force-pushed the issue-1834-hma-likelihood branch 2 times, most recently from 8b79999 to bfa2145 Compare April 9, 2024 19:04
@fealho fealho force-pushed the issue-1834-hma-likelihood branch from bfa2145 to f60d850 Compare April 11, 2024 15:03
@fealho fealho merged commit 4691fab into main Apr 12, 2024
37 checks passed
@fealho fealho deleted the issue-1834-hma-likelihood branch April 12, 2024 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HMA likelihood match should respect cardinality
5 participants