Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add small integer representation #4204

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

DRMacIver
Copy link
Member

Long-standing Hypothesis shrinking advice is that when you write one_of(x, y) then you should make sure that x is simpler than y so as to get good shrinking behaviour.

But, uh, what does "simpler" mean? Well it means "has smaller representations in the Hypothesis internal shrink order". Fair enough.

Well it turns out that this secretly means two different things:

  1. Which of these is typically smaller?
  2. Which of these is smaller once shrunk?

Which of these do we mean?

Well, uh, unfortunately we mean both. The former is important to get good shrinking performance/behaviour, the latter is important to to get good results once fully shrunk.

Sure would be a shame if there were common pairs of strategies where these gave different answers, huh?

Anyway one_of(integers(), text()) is such an example. integers() are typically (and sortof logically should be) smaller than text, but 0 is actually larger than '' in both the new and old representations.

This PR adds a small-integer optimisation that fixes both. It gives us a single-byte representation of small integers in the old buffer-based implementation, and adds a special case for zero in the serialisation format of the new IR representation (non-zero integers don't need special casing here, because in the new representation this is only a problem for 0. Any string of length > 0 will be at least two bytes, so the IR already handles the sizing of small non-zero integers correctly.

@DRMacIver DRMacIver requested a review from Zac-HD as a code owner December 16, 2024 12:59
@DRMacIver DRMacIver force-pushed the DRMacIver/smol-integers branch from 730ac18 to 5bcce8b Compare December 16, 2024 13:00
@DRMacIver DRMacIver force-pushed the DRMacIver/smol-integers branch from 5bcce8b to b066401 Compare December 16, 2024 13:02
@DRMacIver DRMacIver marked this pull request as draft December 16, 2024 23:13
Copy link
Member

@tybug tybug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without looking too closely yet at the _draw_unbounded_integer changes - while this makes sense in principle for the bytestring, I'd just forewarn that a lot of this work will be redundant/overwritten/solved in a different way on the typed choice sequence, so don't be surprised to see this code go away in the near future! (e.g., shrink ordering is independent of buffer size on the TCS).

data.mark_interesting()

shrinker.fixate_shrink_passes(["minimize_individual_nodes"])
assert shrinker.shrink_target.ir_nodes[0].value == boundary
Copy link
Member

@tybug tybug Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shrink_target.choices[0] is a nice concise alternative here! I envision .buffer[i] -> .choices[i] being the default migration path for bytestring tests, though of course with tweaked indices.

(or better yet shrinker.choices, using the implicit forwarding to .shrink_target).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants