Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminate Pending orchestration #178

Open
jundayin opened this issue Jul 14, 2023 · 4 comments · May be fixed by #276
Open

Terminate Pending orchestration #178

jundayin opened this issue Jul 14, 2023 · 4 comments · May be fixed by #276
Assignees
Labels
P1 Priority 1

Comments

@jundayin
Copy link

Hi, echoing the thread Azure/durabletask#662 in this repo.

I believe SQL provider works similar to AzureStorageOrchestrationService, which only terminate running orchestration. In our code, we schedule one orchestration which will be triggered in a month. Before it is executed, we also allow user to terminate the orchestration.

However, those scheduled orchestration will be not terminated right away but until the timer expires. Since this is a rather active task for customer, it generates 3K rows in NewEvents table within 3 weeks.
Screenshot 2023-07-13 185345

Is this a provider behavior or the core behavior? It doesn't seem necessary to terminate the orchestration only when it starts running. The orchestration instance already presents in the db across the board. If it is provider behavior then I think we can have one improvement of terminate both pending and running orchestration in order to reduce the load in NewEvents table

@jundayin
Copy link
Author

jundayin commented Jul 19, 2023

Actually I think this is more a bug instead of expected behavior. It is not blocking something, but I suspect it may have performance implication

I just glance there are a lot of exceptions log from the pod of our services

Backing off for 1 seconds until 5 successful operations
2023-07-19 00:34:41.8969|ERROR|DurableTask.Core|TaskOrchestrationDispatcher-c6b2e24a2c884db5b6ee1fe277fcfae1-0: Unhandled exception with work item 'aaf0757b-5c35-46e5-9d8b-bc8406ee0289': System.NullReferenceException: Object reference not set to an instance of an object.
   at DurableTask.SqlServer.SqlOrchestrationService.CompleteTaskOrchestrationWorkItemAsync(TaskOrchestrationWorkItem workItem, OrchestrationRuntimeState newRuntimeState, IList`1 outboundMessages, IList`1 orchestratorMessages, IList`1 timerMessages, TaskMessage continuedAsNewMessage, OrchestrationState orchestrationState) in /_/src/DurableTask.SqlServer/SqlOrchestrationService.cs:line 311
   at DurableTask.Core.TaskOrchestrationDispatcher.OnProcessWorkItemAsync(TaskOrchestrationWorkItem workItem) in /_/src/DurableTask.Core/TaskOrchestrationDispatcher.cs:line 548
   at DurableTask.Core.TaskOrchestrationDispatcher.OnProcessWorkItemSessionAsync(TaskOrchestrationWorkItem workItem) in /_/src/DurableTask.Core/TaskOrchestrationDispatcher.cs:line 195
   at DurableTask.Core.WorkItemDispatcher`1.ProcessWorkItemAsync(WorkItemDispatcherContext context, Object workItemObj) in /_/src/DurableTask.Core/WorkItemDispatcher.cs:line 459

Not surprisingly, if I query the instance id in [dt].[NewEvents] table. I will end up with two events here. One is the pending ExecutionStarted event and another is ExecutionTerminated event. I queried at least 30 instance ids and they are in the same pattern

Screenshot 2023-07-18 at 5 39 27 PM

cc: @cgillum Are you aware of this? We are currently still on the 1.0.0-rc due to the breaking change in 1.0.0-rc2 and we plan to migrate the latest version. So we do not know if this issue is already fixed or not. If not, please triage it as a bug

@davidmrdavid davidmrdavid self-assigned this Aug 8, 2023
@bachuv bachuv added P1 Priority 1 and removed Needs: Triage 🔍 labels Aug 15, 2023
@conreaux
Copy link

We're also seeing this behavior in v1.2.2. ExecutionTerminated event is blocked by the ExecutionStarted event in the Pending state. ETA on fix?

@cgillum
Copy link
Member

cgillum commented Oct 11, 2024

Apologies - this issue was not on my radar until now. If I'm reading the above repro steps correctly, it seems that the combination of a scheduled creation plus a termination (before the orchestration starts) results in an internal error. I can try to see if I can reproduce.

@conreaux
Copy link

@cgillum Thanks for the quick turnaround. Much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 Priority 1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants