-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a fuzzer for sliding window schedules #8144
base: main
Are you sure you want to change the base?
Conversation
Interestingly this broke a few tests because they now do fewer evaluations than expected, so it seems the old sliding window was better in those cases. However it also broke camera pipe because it can't fold some storage as aggressively, so there must also be cases where it's worse (or maybe it just makes the bounds expressions more complex so storage folding gets more conservative). |
I think both of those changes are expected, and are not necessarily better. The old way of sliding window will always do the entire warmup in one production, while the new way would do it one line at a time, just like the steady state. For an N point stencil, this is N times as many calls, but I wouldn't consider it an improvement. It's expected that the old way will fail to fold storage as tightly, which is why we need the new way for devices with a constrained amount of memory for intermediate buffers. Consider two 3 point stencils in a row: the old method will want to do the warmup entirely on the first iteration, which requires computing 5 elements at once, so we can't fold into less than 5 elements. However, the new method will be able to warm up one line at a time, as required by each line of the second stencil, so we can fold into 3 elements. Point being, it's not because the bounds expressions are worse or some unrelated thing that could be fixed, it's an inherent property of the two warmup strategies. |
Oh no, the sliding_reduction test is now doing too few evaluations and producing the wrong output. Sliding window was broken before, too 😭 Edit: Actually it's doing the right amount of evaluations of the stage with the call counter in it, so that number should indeed be 42, not 48. However it's running the prior update stage (which doesn't have a call counter attached) too few times. |
I understand your storage folding point, but not the other one. Why is it N times as many evaluations of the producer stage? I'm not counting loop iterations here. Shouldn't it compute each required value of the producer just once in either case? Oh you're saying it's N times as many productions. I was counting stores, not productions. |
I have confirmed that this is broken even when reverting all of SlidingWindow.cpp back to before the change that added loop rewinding. FML |
Oh, I think this is actually an instance of #7819. A row of one of first update stages is being skipped, because it's not used by the output (according to the trimmed .min and .max), but an earlier row was clobbered with garbage by a ShiftInwards on the pure definition. Edit because I was asked about it: This comment only refers to the bug in sliding_reduction. The fuzzer doesn't do any overcompute that would trigger #7819 |
Overcompute on sliding window stages is problematic if garbage could be produced in the overcomputed region.
What's the status on this PR? |
Blocked by #7819 |
Works around issue #8140
It would test hoist storage too, but that's disabled due to issue #8141