-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing tests for rdt 1.13.2 #2320
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2320 +/- ##
==========================================
- Coverage 98.58% 98.57% -0.02%
==========================================
Files 58 58
Lines 6085 6085
==========================================
- Hits 5999 5998 -1
- Misses 86 87 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failing test can be fixed by sorting the order of the rows for both dataframes.
@@ -299,7 +299,7 @@ def _fit(self, table_data): | |||
self._uuids_to_combinations = {} | |||
for combination in self._combinations.itertuples(index=False, name=None): | |||
mappable_combination = get_mappable_combination(combination) | |||
uuid_str = str(uuid.uuid4()) | |||
uuid_str = str(uuid.uuid5(uuid.NAMESPACE_DNS, str(mappable_combination))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works but do we have to worry about the fact that mappable_combination
is not actually a valid DNS? In the docs for uuid, it says
When this namespace is specified, the name string is a fully qualified domain name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The namespace is any uuid and name is any string, which are then concatenated and hashed. This is a good discussion on the topic.
If you prefer we can also hash it ourselves, like so:
hex_string = hashlib.md5(str(mappable_combination).encode("UTF-8")).hexdigest()
uuid_str = uuid.UUID(hex=hex_string)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link to the discussion is broken, but honestly either one is fine. We just need to make sure that each combination is mapped to a unique value.
The more I think about it, we only promise that sampling can be reset to get the same values. Since this happens during fitting, it might be ok to just leave as is and sort in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, from a user perspective both solutions work. I'd still lean towards getting rid of the randomness, but either way is fine.
No description provided.