Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(app, react-api-client): add "retry failed command" during Error Recovery #15170

Merged
merged 7 commits into from
May 21, 2024

Conversation

mjhuff
Copy link
Contributor

@mjhuff mjhuff commented May 13, 2024

Closes EXEC-432

Overview

Adds the ability to retry the command that caused the run to enter Error Recovery, resuming the run after the issued command completes. Implementing the retry logic requires laying out (hopefully) most of the remaining app-side foundational work, therefore that is included here as well.

There is a good bit of noise in this PR. A few specifics are mentioned via comments. Below are the major highlights and discussion points:

Chaining Fixit Commands

In order to chain (or even create) commands with a fixit intent, we utilize useRecoveryCommands. I currently make the assumption that we do not want to handle fixit commands that fail during error recovery, so we always continuePastCommandFailure. Whether we actually want this behavior or not (or make this dependent on the exact fixit command) is an involved conversation and must also ensure error recovery can't possibly cause an infinite loop (such as a faulty sensor always detecting no tip attached after the user dispatches additional fixit pick up tip commands).

Identifying the Failed Run Command

Designs utilize splash pages that require knowledge of the failed command (example). While identifying and retaining the failed command is simple on the ODD, the complexity increases when we add desktop apps into the mix. A desktop user may be anywhere in the app and then may navigate to Error Recovery while a user engages (or has engaged) with Error Recovery on the ODD. In order to determine which command is the failed command, we must navigate backwards through the active run's list of commands, finding the most recent failed command with a protocol intent. And that's the issue; because fixit commands are added to the run command list, the following scenario may occur:

  • User A on the ODD creates a significantly large number of fixit commands. These commands are appended to the protocol run commands list.
  • User B on the desktop app navigates to the run, and the app attempts to find the last failed protocol command. No failed command is found when querying the server, which occurs due to the failed command being farther back in history than the pageLength sent to GET /runs/:runId/commands returned. See this discussion for some background as to why this is the case.

@sanni-t's recent PR #15031 solves the serialization issue for completed runs. We could consider extending this functionality for commands during an on-going run (and maybe add a rate limiter?!?). @sfoster1 's thoughts are highlighted in the slack convo linked above after discussion in-person.

Robot Motion State

If you look at designs, there are a lot of different "robot in motion" screens. Sometimes we only want the motion screen when the robot does a command, and sometimes we want to persist it longer or indefinitely. I think the best way to handle motion is to explicitly control it by setting the motion screen before we do a command and then control it afterwards. This is a departure from the implicit RobotInMotion component takeover that all the other wizard flows use, but I don't think that's going to work for us, and even if it is possible, the control flow would be very hard to follow.

Test Plan

  • Turn on the BE/FE FFs for ER.
  • Run any protocol with a Pick up tip command and don't pick up the tip. Proceed through the ER flows. Verify the initial Z-home occurs when launching ER, and clicking "resume" (see Review requests - designs are quite different than current flows and that's ok) causes a pick up tip command.
  • NOTE: The retry failed command pick up tip will actually fail until fix(api): only add tip if pickup successful #15143 merges.
  • Verify the run resumes. There may be some oddness with the command text on the desktop app/odd, but that's ok for now.

Changelog

  • Added "retry failed command" option in Error Recovery.

Review requests

Designs have changed quite a bit in the past couple of weeks. This PR is just focused on functionality and structure!

Risk assessment

low -- behind a ff

…covery

Adds the ability to retry command that caused the run to enter error recovery. Resumes the run after
completing the failed command.
@mjhuff mjhuff requested a review from a team May 13, 2024 12:54
@mjhuff mjhuff requested a review from a team as a code owner May 13, 2024 12:54
@mjhuff mjhuff requested review from smb2268 and removed request for a team and smb2268 May 13, 2024 12:54
Copy link
Contributor Author

@mjhuff mjhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some explanatory comments below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diff this against the old index.tsx to see changes.

The differences are the new route for the "retrying command" view, the useInitialPipetteHome, and the new name, ErrorRecoveryWizard.

app/src/organisms/ErrorRecoveryFlows/utils.ts Outdated Show resolved Hide resolved
runStatus: RunStatus | null
): UseErrorRecoveryResult {
const [isEREnabled, setIsEREnabled] = React.useState(false)
const failedCommand = useCurrentlyFailedRunCommand(runId, runStatus)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly exporting useCurrentlyFailedRunCommand may be something we have to refactor out later, as this implies that we only would ever need a failedCommand for reasons related to Error Recovery.

* Recovery Step: Analogous to a "step" in other wizard flows.
*/
const [recoveryMap, setRecoveryMap] = React.useState<IRecoveryMap>({
route: RECOVERY_MAP.ROBOT_IN_MOTION.ROUTE, // Initialize the route to "in motion", since wizard always start with a home command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect the robot to already be in motion when we start this wizard?
Or does the robot not start homing until the useInitialPipetteHome call like 15 lines down?

If it is already in motion, then this makes sense, if it is not, should useInitialPipetteHome be the thing that adds the ROBOT_IN_MOTION step to the route?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Q - doing it this way prevents us initially rendering some other screen for one render cycle before the in-motion screen. This problem is really only noticeable if you test it on the physical ODD, and it looks bad enough IMO that I'd rather eagerly set the robot to 'in motion' immediately and not have the screen immediately changing as soon as we launch ER.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DerekMaggio After further investigation, I found a better React way of doing this while initializing the route to something more sensible, so thank you for calling this out!

React.useEffect(() => {
void setRobotInMotion(true)
.then(() => homePipetteZAxes())
.then(() => setRobotInMotion(false))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably wants to be .finally()

<ErrorRecoveryHeader errorKind={props.errorKind} />
<ErrorRecoveryContent {...props} />
</Flex>,
getTopPortalEl()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably wants to be getModalPortalEl if this is on the desktop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yep, great point. Given hi-fi designs, it looks like we may have some modal-within-modals as well, so we'd want the lower zIndex for ODD as well.

getTopPortalEl()
)
interface UseErrorRecoveryResult {
isEREnabled: boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind of a nit, but the use of "enabled" here makes you think this is a wrapper for the feature flag at first glance. can we call it like isERActive and so on?

// Otherwise, returns null.
const ALL_COMMANDS_POLL_MS = 5000

export function useCurrentlyFailedRunCommand(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

credit to @SyntaxColoring for this one - what if we added a cursor to GET /runs/:runid/commands that was "last failed command" so we didn't have to parse the command list at all?

Copy link
Contributor Author

@mjhuff mjhuff May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about a server side change? If app side, how would this work with a desktop app that navigates to the run after 100+ fixit commands? Wouldn't it need knowledge of which command failed to know which cursor to use? CC @SyntaxColoring

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a server side change.

Copy link
Contributor

@SyntaxColoring SyntaxColoring May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXEC-458

I'm working on the server part of this now.

@mjhuff mjhuff requested review from sfoster1 and a team May 20, 2024 20:36
Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@mjhuff mjhuff merged commit dac2589 into edge May 21, 2024
21 checks passed
@mjhuff mjhuff deleted the retry-failed-command branch May 21, 2024 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants