Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline in failed state "can't resume" #505

Open
juliev0 opened this issue Jan 15, 2025 · 1 comment · May be fixed by #507
Open

Pipeline in failed state "can't resume" #505

juliev0 opened this issue Jan 15, 2025 · 1 comment · May be fixed by #507
Assignees
Labels
bug Something isn't working

Comments

@juliev0
Copy link
Collaborator

juliev0 commented Jan 15, 2025

Describe the bug

This is related to when there's a Failed Pipeline. We've decided before that a Failed Pipeline does not require pausing.

(Note that a Failed pipeline can occur when an isbsvc is upgrading, which is what we observed.)

I think there may be 2 issues which are in some ways canceling each other out:

  1. The Failed Pipeline is supposed to be okay to resume, but there's a check to prevent resuming if isbsvc or numaflow controller is requesting pause, which there shouldn't be.
  2. After we try and fail to resume, we leave the PPND "in progress strategy", which we probably shouldn't. Then we direct apply the change which removes the "desiredPhase" field.

Message from the maintainers:

Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.

@juliev0 juliev0 added the bug Something isn't working label Jan 15, 2025
@juliev0 juliev0 self-assigned this Jan 15, 2025
@juliev0
Copy link
Collaborator Author

juliev0 commented Jan 15, 2025

cc @whynowy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant