-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(api): Error instead of infinite-looping when disposal_volume is too high to make progress #6754
Conversation
Codecov Report
@@ Coverage Diff @@
## edge #6754 +/- ##
=======================================
Coverage ? 93.55%
=======================================
Files ? 110
Lines ? 4812
Branches ? 0
=======================================
Hits ? 4502
Misses ? 310
Partials ? 0 Continue to review full report at Codecov.
|
api/tests/opentrons/protocols/advanced_control/test_transfers.py
Outdated
Show resolved
Hide resolved
This is covered by another todo comment in the test itself, but should also probably get its own ticket.
15037c3
to
3cc63eb
Compare
api/tests/opentrons/protocols/advanced_control/test_transfers.py
Outdated
Show resolved
Hide resolved
('p300_single', 'opentrons_96_tiprack_300ul', 300, 300), | ||
|
||
# pipette max != tip max, disposal_volume == tip max | ||
# todo(mm, 2021-03-10): These fail unexpectedly, apparently a bug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do these skipped tests fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests fail because the function under test returns without raising the expected ValueError
.
Here's what it returns, as printed by this test's print()
statement:
--- Captured stdout call ---
{'method': 'pick_up_tip', 'args': [], 'kwargs': {}}
{'method': 'drop_tip', 'args': [], 'kwargs': {}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to explain this behavior now rather than leave the tests marked as skip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's good that you added these tests, the general testing strategy for this module is tricky. I would have preferred to see individual tests for all the steps. I'm not suggesting you do that now for all the functions. But perhaps just for _expand_for_volume_constraints
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have preferred to see individual tests for all the steps.
Meaning:
- One test function to make sure
distribute()
errors when asked to use ap20_single_gen2
with 20 µL tips to distribute with a disposal volume of 20 µL. - One test function to make sure
distribute()
errors when asked to use ap20_single_gen2
with 10 µL tips to distribute with a disposal volume of 20 µL. - etc.
Or do I misunderstand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not responding sooner.
I meant that i'd want to see tests for the individual functions (_create_volume_gradient
, _check_valid_well_list
, _create_volume_list
, _expand_for_volume_constraints
etc. ) rather than just full transfer plan.
# or air_gap + disposal_volume, is too high. | ||
|
||
# Boilerplate: TransferPlan wants an InstrumentContext. | ||
context = papi.ProtocolContext( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can skip this boilerplate by using the ctx
fixture defined in conftest.py
.
@@ -600,6 +605,11 @@ def _expand_for_volume_constraints( | |||
""" Split a sequence of proposed transfers if necessary to keep each | |||
transfer under the given max volume. | |||
""" | |||
|
|||
if max_volume <= 0: | |||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we take this opportunity to define a more specific error type for transfer build errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm thinking about this too.
If you think just swapping the type of ValueError
to something more specific would help, I can certainly do that.
I'm also trying to figure out how to get more user-friendly errors for people using the outer protocol API, like:
ValueError: Can't distribute with a disposal_volume of 400 µL when the tip can only hold 300 µL.
As opposed to either of these:
ValueError: max_volume must be greater than 0. (Got 0.)
UnplannableTransferError: max_volume must be greater than 0. (Got 0.)
The tension is:
_expand_for_volume_constraints()
is central, so if we validate its arguments here, that validation can cover a lot of ground.- But on the other hand, it's too low-level for that validation to raise user-friendly diagnostics. It can't know why
max_volume
is 0, so error messages will necessarily be opaque and jargony.
@@ -600,6 +605,11 @@ def _expand_for_volume_constraints( | |||
""" Split a sequence of proposed transfers if necessary to keep each | |||
transfer under the given max volume. | |||
""" | |||
|
|||
if max_volume <= 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might make our future rework a little bit harder, but it's probably worth guarding this in an apiVersion
check, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original hope was that this PR would be narrowly scoped to just the variants of this bug whose fixes wouldn't need an apiVersion
check. See "Risk assessment" in my OP.
Maybe that's an exercise in futility. Maybe we should expand the scope of this PR to fix more variants of the bug, and wrap them all in an apiVersion
check?
Based on review comments and out-of-band conversations, this bug doesn't seem like a good thing to attempt to solve incrementally. The backwards compatibility concerns that make this such a slog to work on are ticketed for research in #7477. Meanwhile, I'll re-draft this PR, try to understand the variants of this bug that I had deferred investigating, and see if we can fix them all in one |
I think that it would be good enough to dive potential other bug(s) and create tickets. The most important thing is to fix the infinite loop; either by raising an exception or doing nothing. |
pipenv install --dev --keep-outdated pytest-timeout
If these unexpectedly pass, it means we changed protocol API behavior without an apiLevel bump.
I'm dropping this, for now. A combination of things makes this bug deceptively difficult to solve This particular area of code:
So, there's currently a big space of uncharacterized, undefined behavior that you can accidentally fall into. #6170, the bug that kicked this off, originally reported an infinite loop—but depending on the exact test case, instead of that, you can get other confusing results, like a big transfer breaking down into a Fully characterizing this space of undefined behavior turns out to be a lot of work. The work so far in this PR only partially accomplishes it. But, by my understanding of our Python Protocol API's versioning policy, we must characterize and preserve that behavior in fixing these bugs. Even though it was undocumented, and didn't make sense, and only happened in incorrectly written protocols. In short, I can't be confident that, if I fix these bugs, I won't unintentionally change behavior in some place where I'm not specifically looking. The more general versioning problem is ticketed for research as #7477. Depending on the outcome of that research, this could get a lot easier, and we could revisit it. If/when we come back to this, I left And, anyway, we think we'll rewrite this code in Protocol Engine (the ongoing overhaul of our protocol-running back-end). |
Overview
This PR turns code like this into an error:
The protocol API's current behavior for code like this depends on the exact circumstances. For example, sometimes, it hangs in an infinite loop (as reported in #6170). Other times, it seems to treat the
distribute()
step as a no-op.This PR turns one specific variant into an error, and moves towards characterizing other variants.
Changelog
Fix bug: Inappropriate disposal or air_gap volume causes hang in transfers.py #6170. Before, the test protocol in that ticket would infinite-loop. Now, it raises a
ValueError
internally.Leave these related bugs unfixed for now:
When the pipette tip has a lower maximum volume than the pipette itself, it seems like an invalid disposal volume can make the
distribute()
effectively no-op. It picks up a tip and then drops it, without any sub-steps.This PR's unit tests caught this by chance. They're
xfail
'd.If we merge this PR without a fix for this bug, I'll ticket this bug separately.
Even with this PR, another variant of bug: Inappropriate disposal or air_gap volume causes hang in transfers.py #6170, calling
transfer()
instead ofdistribute()
, still hangs.I haven't looked into why.
I found this by manual fiddling, so it's not currently represented in the unit tests.
If we merge this PR without a fix for this bug, I'll ticket this bug separately.
Risk assessment
Medium.
There's a risk that this PR accidentally changes edge-case behavior in runnable protocols without an
apiLevel
bump.This PR is intended to maintain the exact current behavior of all runnable protocols. It should merely improve how the error is presented for certain non-runnable protocols. (Instead of infinite-looping, there should be an error message.)
But suppose someone's written a "wrong," but runnable, protocol. The protocol provides an invalid value to one of our API functions, but the function currently happens not to raise any error. If this PR accidentally does make it raise an error, that would break our API versioning policy.
It's possible that this is the case, because:
_expand_for_volume_constraints()
is called.transfer()
,distribute()
, andconsolidate()
don't seem well-characterized.xfail
'd test parametrizations would time out if they had a chance to run. If this assumption is wrong, this PR could be changing the behavior of runnable protocols.Review requests
Given the deferred work that I mentioned in "Changelog," and the risks I mentioned in "Risk assessment," are we comfortable merging this as-is? Or should I try to understand this code more before I poke it with a stick?