Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a mechanism to handle asynchronous call failures (pubsub, cron, FSM, etc.) #2185

Closed
alecthomas opened this issue Jul 26, 2024 · 3 comments · Fixed by #2190
Closed

Need a mechanism to handle asynchronous call failures (pubsub, cron, FSM, etc.) #2185

alecthomas opened this issue Jul 26, 2024 · 3 comments · Fixed by #2190
Assignees

Comments

@alecthomas
Copy link
Collaborator

alecthomas commented Jul 26, 2024

Ideally a single mechanism that can be applied to all of them, rather than specific solutions for each such as a dead letter topics for pubsub.

Possible ideas:

  • Add an ftl.LastRetry(ctx) function that can be used to detect if the current retry is the last one.
  • Modify //ftl:retry to support a "catch" verb that will be called on complete failure.
@alecthomas alecthomas added the P0 Work on this now label Jul 26, 2024
@github-actions github-actions bot added the triage Issue needs triaging label Jul 26, 2024
@ftl-robot ftl-robot mentioned this issue Jul 26, 2024
@matt2e
Copy link
Collaborator

matt2e commented Jul 28, 2024

won't work for PubSub

How come?

@matt2e matt2e self-assigned this Jul 28, 2024
@github-actions github-actions bot removed the triage Issue needs triaging label Jul 28, 2024
@matt2e
Copy link
Collaborator

matt2e commented Jul 29, 2024

Plan:

  • add a catch verb definition to //ftl:retry directives
  • add a state to async calls called catching when retries have been exhausted. Lease won't be released for the async call until the catch verb is called.
  • async call doesn't complete until catch verb succeeds

Future plans:

  • Add new request struct for catch verbs that includes error, retry metadata, call verb, request body, etc

@matt2e
Copy link
Collaborator

matt2e commented Jul 30, 2024

Removing P0, Moe said they won't have time to use it this week

@matt2e matt2e removed the P0 Work on this now label Jul 30, 2024
github-merge-queue bot pushed a commit that referenced this issue Aug 8, 2024
closes #2185
closes #2212

Retry directives can now define a catch verb:
```go
//ftl:retry 10 1s catch example.catchPayments
```

Catch verbs have a special request type:
```go
func CatchPayments(ctx context.Context, req builtin.CatchRequest[publisher.PubSubEvent]) error {
    // do something
    return nil
}
```
Behavior:
- FTL will keep attemping the catch verb until it does not return an
error. It will do this at the backoff rate that the retries progressed
to before catching.
- `builtin.CatchRequest[EventType]` provides the original request and a
string of the error that was returned on the last attempt by the
subscriber
- Once in a catch verb for a FSM transition, there is currently no way
to prevent the FSM from reaching a failed state

Behind the scenes:
Async calls use a new row per attempt. This PR continues that pattern:
- After the last attempt is completed, if there is a catching verb
defined then a new row is added with `catching = true`
- Originally I had gone down the road of making this a different state
but it got tricky with `AcquireAsyncCall` coordinating leases with the
`pending` and `executing` states, and would require 2 new states for the
catching equivalent of those.
- If the catch verb could not be called or fails, a new async call row
is inserted with the current backoff
- The next scheduled attempt to catch keeps the original verb's error,
not the new catch error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants