Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(api): Error instead of infinite-looping when disposal_volume is too high to make progress #6754

Closed
wants to merge 17 commits into from

Conversation

SyntaxColoring
Copy link
Contributor

@SyntaxColoring SyntaxColoring commented Oct 13, 2020

Overview

This PR turns code like this into an error:

# disposal_volume too high.
# There would be no room left in the tip to move liquid to dests.
p300.distribute(123, source, dests, disposal_volume=300)

The protocol API's current behavior for code like this depends on the exact circumstances. For example, sometimes, it hangs in an infinite loop (as reported in #6170). Other times, it seems to treat the distribute() step as a no-op.

This PR turns one specific variant into an error, and moves towards characterizing other variants.

Changelog

  • Fix bug: Inappropriate disposal or air_gap volume causes hang in transfers.py #6170. Before, the test protocol in that ticket would infinite-loop. Now, it raises a ValueError internally.

  • Leave these related bugs unfixed for now:

    • When the pipette tip has a lower maximum volume than the pipette itself, it seems like an invalid disposal volume can make the distribute() effectively no-op. It picks up a tip and then drops it, without any sub-steps.

      This PR's unit tests caught this by chance. They're xfail'd.

      If we merge this PR without a fix for this bug, I'll ticket this bug separately.

    • Even with this PR, another variant of bug: Inappropriate disposal or air_gap volume causes hang in transfers.py #6170, calling transfer() instead of distribute(), still hangs.

      metadata = {"apiLevel": "2.7"}
      
      def run(protocol):
          labware = protocol.load_labware('nest_12_reservoir_15ml', 1)
          tip_rack = protocol.load_labware('opentrons_96_filtertiprack_200ul', 2)
          pipette = protocol.load_instrument('p300_single', mount='left', tip_racks=[tip_rack])
          pipette.transfer(10, labware.wells()[0], labware.wells()[1], disposal_volume=200)

      I haven't looked into why.

      I found this by manual fiddling, so it's not currently represented in the unit tests.

      If we merge this PR without a fix for this bug, I'll ticket this bug separately.

Risk assessment

Medium.

There's a risk that this PR accidentally changes edge-case behavior in runnable protocols without an apiLevel bump.

This PR is intended to maintain the exact current behavior of all runnable protocols. It should merely improve how the error is presented for certain non-runnable protocols. (Instead of infinite-looping, there should be an error message.)

But suppose someone's written a "wrong," but runnable, protocol. The protocol provides an invalid value to one of our API functions, but the function currently happens not to raise any error. If this PR accidentally does make it raise an error, that would break our API versioning policy.

It's possible that this is the case, because:

  • I'm not confident I fully understand how _expand_for_volume_constraints() is called.
  • Although the test that this PR introduces is a good start, the current edge-case behaviors of transfer(), distribute(), and consolidate() don't seem well-characterized.
    • Evidently, there are variants of this bug other than exactly what's reported in bug: Inappropriate disposal or air_gap volume causes hang in transfers.py #6170. This test catches some variants, but not all.
    • I didn't have a great way of telling Pytest to expect a test to time out. I'm assuming that, without the fix, all the non-xfail'd test parametrizations would time out if they had a chance to run. If this assumption is wrong, this PR could be changing the behavior of runnable protocols.

Review requests

Given the deferred work that I mentioned in "Changelog," and the risks I mentioned in "Risk assessment," are we comfortable merging this as-is? Or should I try to understand this code more before I poke it with a stick?

@SyntaxColoring SyntaxColoring added api Affects the `api` project WIP fix PR fixes a bug labels Oct 13, 2020
@SyntaxColoring SyntaxColoring self-assigned this Oct 13, 2020
@codecov
Copy link

codecov bot commented Oct 13, 2020

Codecov Report

❗ No coverage uploaded for pull request base (edge@f4b6efe). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             edge    #6754   +/-   ##
=======================================
  Coverage        ?   93.55%           
=======================================
  Files           ?      110           
  Lines           ?     4812           
  Branches        ?        0           
=======================================
  Hits            ?     4502           
  Misses          ?      310           
  Partials        ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4b6efe...364b00f. Read the comment docs.

@SyntaxColoring SyntaxColoring force-pushed the fix_api_disposal_volume_infinite_loop branch from 15037c3 to 3cc63eb Compare March 10, 2021 17:55
@SyntaxColoring SyntaxColoring changed the title fix(api): Don't infinite-loop when reserved volume is too high to make progress fix(api): Error early when disposal_volume is too high to make progress Mar 10, 2021
@SyntaxColoring SyntaxColoring changed the title fix(api): Error early when disposal_volume is too high to make progress fix(api): Error instead of infinite-looping when disposal_volume is too high to make progress Mar 10, 2021
@SyntaxColoring SyntaxColoring marked this pull request as ready for review March 10, 2021 20:53
@SyntaxColoring SyntaxColoring requested a review from a team as a code owner March 10, 2021 20:53
('p300_single', 'opentrons_96_tiprack_300ul', 300, 300),

# pipette max != tip max, disposal_volume == tip max
# todo(mm, 2021-03-10): These fail unexpectedly, apparently a bug.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do these skipped tests fail?

Copy link
Contributor Author

@SyntaxColoring SyntaxColoring Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests fail because the function under test returns without raising the expected ValueError.

Here's what it returns, as printed by this test's print() statement:

--- Captured stdout call ---
{'method': 'pick_up_tip', 'args': [], 'kwargs': {}}
{'method': 'drop_tip', 'args': [], 'kwargs': {}}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to explain this behavior now rather than leave the tests marked as skip.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's good that you added these tests, the general testing strategy for this module is tricky. I would have preferred to see individual tests for all the steps. I'm not suggesting you do that now for all the functions. But perhaps just for _expand_for_volume_constraints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have preferred to see individual tests for all the steps.

Meaning:

  • One test function to make sure distribute() errors when asked to use ap20_single_gen2 with 20 µL tips to distribute with a disposal volume of 20 µL.
  • One test function to make sure distribute() errors when asked to use ap20_single_gen2 with 10 µL tips to distribute with a disposal volume of 20 µL.
  • etc.

Or do I misunderstand?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not responding sooner.

I meant that i'd want to see tests for the individual functions (_create_volume_gradient , _check_valid_well_list, _create_volume_list, _expand_for_volume_constraints etc. ) rather than just full transfer plan.

# or air_gap + disposal_volume, is too high.

# Boilerplate: TransferPlan wants an InstrumentContext.
context = papi.ProtocolContext(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can skip this boilerplate by using the ctx fixture defined in conftest.py.

@@ -600,6 +605,11 @@ def _expand_for_volume_constraints(
""" Split a sequence of proposed transfers if necessary to keep each
transfer under the given max volume.
"""

if max_volume <= 0:
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we take this opportunity to define a more specific error type for transfer build errors?

Copy link
Contributor Author

@SyntaxColoring SyntaxColoring Mar 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm thinking about this too.

If you think just swapping the type of ValueError to something more specific would help, I can certainly do that.


I'm also trying to figure out how to get more user-friendly errors for people using the outer protocol API, like:

ValueError: Can't distribute with a disposal_volume of 400 µL when the tip can only hold 300 µL.

As opposed to either of these:

ValueError: max_volume must be greater than 0. (Got 0.)

UnplannableTransferError: max_volume must be greater than 0. (Got 0.)

The tension is:

  • _expand_for_volume_constraints() is central, so if we validate its arguments here, that validation can cover a lot of ground.
  • But on the other hand, it's too low-level for that validation to raise user-friendly diagnostics. It can't know why max_volume is 0, so error messages will necessarily be opaque and jargony.

@@ -600,6 +605,11 @@ def _expand_for_volume_constraints(
""" Split a sequence of proposed transfers if necessary to keep each
transfer under the given max volume.
"""

if max_volume <= 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might make our future rework a little bit harder, but it's probably worth guarding this in an apiVersion check, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original hope was that this PR would be narrowly scoped to just the variants of this bug whose fixes wouldn't need an apiVersion check. See "Risk assessment" in my OP.

Maybe that's an exercise in futility. Maybe we should expand the scope of this PR to fix more variants of the bug, and wrap them all in an apiVersion check?

@SyntaxColoring
Copy link
Contributor Author

Based on review comments and out-of-band conversations, this bug doesn't seem like a good thing to attempt to solve incrementally.

The backwards compatibility concerns that make this such a slog to work on are ticketed for research in #7477.

Meanwhile, I'll re-draft this PR, try to understand the variants of this bug that I had deferred investigating, and see if we can fix them all in one apiLevel-bumping chunk.

@SyntaxColoring SyntaxColoring marked this pull request as draft March 16, 2021 15:55
@amitlissack
Copy link
Contributor

Based on review comments and out-of-band conversations, this bug doesn't seem like a good thing to attempt to solve incrementally.

The backwards compatibility concerns that make this such a slog to work on are ticketed for research in #7477.

Meanwhile, I'll re-draft this PR, try to understand the variants of this bug that I had deferred investigating, and see if we can fix them all in one apiLevel-bumping chunk.

I think that it would be good enough to dive potential other bug(s) and create tickets. The most important thing is to fix the infinite loop; either by raising an exception or doing nothing.

pipenv install --dev --keep-outdated pytest-timeout
@SyntaxColoring
Copy link
Contributor Author

SyntaxColoring commented Mar 23, 2021

I'm dropping this, for now. A combination of things makes this bug deceptively difficult to solve correctly safely.

This particular area of code:

  • Has an enormous parameter surface area. (Think of all the possible ways of combining disposal_volume, air_gap, distribute vs. consolidate vs. transfer, Single-Channel vs. 8-Channel pipettes, transfer volume, pipette volume, tip volume, and apiLevel backwards compatibility.)
  • Has never had parameter validation.
  • Generally only has test coverage of its happy paths.

So, there's currently a big space of uncharacterized, undefined behavior that you can accidentally fall into. #6170, the bug that kicked this off, originally reported an infinite loop—but depending on the exact test case, instead of that, you can get other confusing results, like a big transfer breaking down into a pick_up_tip with no liquid handling steps.

Fully characterizing this space of undefined behavior turns out to be a lot of work. The work so far in this PR only partially accomplishes it.

But, by my understanding of our Python Protocol API's versioning policy, we must characterize and preserve that behavior in fixing these bugs. Even though it was undocumented, and didn't make sense, and only happened in incorrectly written protocols.

In short, I can't be confident that, if I fix these bugs, I won't unintentionally change behavior in some place where I'm not specifically looking.


The more general versioning problem is ticketed for research as #7477. Depending on the outcome of that research, this could get a lot easier, and we could revisit it.

If/when we come back to this, I left #todo and #fixme comments in this branch to record the next steps.

And, anyway, we think we'll rewrite this code in Protocol Engine (the ongoing overhaul of our protocol-running back-end).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Affects the `api` project fix PR fixes a bug WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: Inappropriate disposal or air_gap volume causes hang in transfers.py
3 participants